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Executive  Summary 


This  report  contains  the  proceedings  from  the  First  International  Workshop  on  Reengineering 
Towards  Product  Lines  (R2PL)  2005,  which  was  held  on  November  10th,  2005  in  Pittsburgh, 
Pennsylvania,  USA  and  colocated  with  the  Working  Conference  on  Reverse  Engineering 
(WCRE)  2005  and  WICSA  2005 — the  Working  Institute  of  Electrical  and  Electronics  Engi¬ 
neers/International  Federation  for  Information  Processing  (IEEE/IFIP)  Conference  on  Soft¬ 
ware  Architecture.  This  report  consists  of  an  overview  of  an  invited  presentation,  a  set  of  po¬ 
sition  papers,  and  details  of  the  workshop’s  outcomes. 


CMU/SEI-2006-SR-002 


VII 


viii 


CMU/SEI-2006-SR-002 


Abstract 


This  report  contains  the  proceedings  from  the  First  International  Workshop  on  Reengineering 
Towards  Product  Lines  (R2PL)  2005,  which  was  held  on  November  10th,  2005  in  Pittsburgh, 
Pennsylvania,  USA  and  colocated  with  the  Working  Conference  on  Reverse  Engineering 
(WCRE)  2005  and  WICSA  2005 — the  Working  Institute  of  Electrical  and  Electronics  Engi¬ 
neers/International  Federation  for  Information  Processing  (IEEE/IFIP)  Conference  on  Soft¬ 
ware  Architecture.  This  report  consists  of  an  overview  of  an  invited  presentation,  a  set  of  po¬ 
sition  papers,  and  details  of  the  workshop’s  outcomes. 
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1  Background 


Today,  software-intensive  systems  are  developed  more  and  more  using  product  line  ap¬ 
proaches.  These  approaches  require  the  definition  of  a  product  line  architecture  that  implicitly 
or  explicitly  specifies  some  degree  of  variability.  This  variability  is  used  to  instantiate  con¬ 
crete  software  product  instances.  A  product  line  approach  not  only  implies  reuse  of  architec¬ 
ture-level  design  knowledge,  it  also  facilitates  reuse  of  implementation-level  artifacts,  such  as 
source  code  and  executable  components.  The  use  of  software  product  lines  can  reduce  the 
cost  of  developing  new  products  significantly. 

In  practice,  software  products  are  usually  not  developed  from  scratch.  Software  product  lines 
are  typically  introduced  following  an  evolutionary  approach.  First,  a  product  line  architecture 
is  defined  based  on  an  initial  set  of  products.  Then,  the  scope  of  the  product  line  is  gradually 
extended  by  incoiporating  more  existing  and  new  products.  Before  a  product  line  is  extended, 
its  suitability  for  incorporating  more  products  needs  to  be  evaluated,  as  well  as  the  extent  to 
which  the  new  and  currently  included  products  conform  to  the  product  line  architecture. 

For  companies  adopting  a  product  line  approach  for  their  software  development,  the  problem 
remains  of  how  to  reuse  as  much  as  possible  of  the  existing  legacy  development  artifacts. 
Reuse  can  be  applied  to  the  definition  and  implementation  of  a  product  line  architecture  and 
to  the  specifications  and  implementation  of  concrete  product  instances  based  on  (legacy) 
software  development  artifacts.  In  this  workshop  [Graaf  05,  R2PL  05],  we  discuss  the  use  of 
reverse  engineering  and  reengineering  technology  to  solve  the  problems  described  above. 

The  papers  included  in  this  report  appear  exactly  as  they  did  in  the  original  presentations; 
they  have  not  been  edited  further  (aside  from  adjusting  their  section  numbers  for  the  new  lay¬ 
out  in  this  report). 
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3  Invited  Talk:  Consolidating  Software 
Variants  into  Software  Product  Lines — A 
Research  Outline 

Rainer  Koschke 

University  of  Bremen 
Germany 


Abstract 

Software  product  lines  often  arise  from  a  set  of  variants  of  a  common  code  basis  that  have 
been  individually  adapted  to  a  particular  requirement  variability.  This  ad-hoc  and  unplanned 
approach  causes  serious  maintenance  problems.  Migrating  such  variants  into  an  organized 
software  product  line  promises  better  maintainability. 

In  this  talk,  I  shall  outline  our  3 -year  research  program  aiming  at  consolidating  software  vari¬ 
ants  into  software  product  lines.  We  are  tackling  the  problem  both  at  the  source  code  level 
and  architectural  level.  We  are  adapting  and  extending  techniques,  such  as  clone  detection, 
feature  location,  protocol  recovery,  and  reflexion-based  reconstruction  that  we  have  so  far 
applied  only  to  individual  systems. 
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4  Quality-Driven  Conformance  Checking  in 
Product  Line  Architectures 


Femi  G.  Olumofin 
Vojislav  B.  Misic 

University  of  Manitoba,  Winnipeg, 
Manitoba,  Canada 


Abstract 

Software  product  lines  are  often  devel¬ 
oped  through  reengineering  existing  prod¬ 
ucts  and  legacy  applications.  In  such  cases 
it  is  not  uncommon  for  the  behavioural 
and  quality  characteristics  of  individual 
product  architectures  to  be  inconsistent 
with  those  of  the  common  architecture. 
Successful  development  of  product  lines 
dictate  that  those  inconsistencies  be  re¬ 
solved.  The  resolution  process  involves 
bringing  the  product  architecture  into 
structural,  semantic  and  quality  attribute- 
related  congruence  with  the  common  ar¬ 
chitecture.  Additional  steps  must  be  taken 
to  ensure  their  continued  conformance  in 
order  to  facilitate  subsequent  maintenance 
and  evolution  activities.  In  this  paper,  we 
describe  a  simple  design-time  technique 
that  aims  to  ensure  that  quality  attribute 
responses  of  individual  product  architec¬ 
tures  are  in  conformance  with  those  of  the 
common  architecture.  The  technique  is 
based  on  the  concept  of  variation  points. 

4.1  Introduction 

For  more  than  a  decade,  software  architec¬ 
ture  has  been  steadily  gaining  importance 
as  the  most  effective  vehicle  for  the  de¬ 
velopment  of  complex  software  intensive 
systems.  Architecture-based  design  offers 


unmatched  flexibility  and  allows  crucial 
insights  to  be  obtained  very  early  in  the 
design  cycle.  Architectural  abstraction 
avoids  complex  code  level  details  while 
making  component  structures  and  interre¬ 
lationships  explicit.  In  this  manner,  the  use 
of  architecture  facilitates  human  under¬ 
standing  of  the  system  as  well  as  reason¬ 
ing  about  quality  characteristics  and  at¬ 
tributes.  It  should  come  as  no  surprise, 
then,  that  the  reengineering  of  existing 
systems  and  legacy  applications — 
recovering  their  structure  in  order  to  de¬ 
velop  new,  functionally  equivalent  but 
improved  systems — often  focuses  on  re¬ 
covering  or  reconstructing  the  architecture 
in  the  form  of  a  product.  Most  such  efforts 
are  motivated  by  changes  in  quality  attrib¬ 
utes,  such  as  extendibility  and  maintain¬ 
ability,  rather  than  by  the  need  for  func¬ 
tional  changes  and  enhancements  [3,  10]. 
For  example,  consider  a  system  that  has 
undergone  several  maintenance  cycles 
which  included  functionality  enhance¬ 
ments.  While  the  system  itself  may  be  in 
working  order,  the  documentation  com¬ 
plexity  and,  possibly,  inconsistency  make 
further  maintenance  difficult.  The  first 
thought  would  be  to  leave  the  system  as  it 
is  and  reconstruct  the  documentation  only; 
but  a  better  way  is  to  disregard  the  docu¬ 
mentation  and  recover  the  system  archi¬ 
tecture  from  the  system  itself.  Oftentimes, 
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architecture  recovery  is  the  first  step  to¬ 
wards  reengineering  the  entire  system. 

All  of  the  aforementioned  advantages  are 
even  more  important  in  the  case  of  soft¬ 
ware  product  families  or  product  lines  [5]: 
sets  of  related  yet  distinct  software  inten¬ 
sive  systems  developed  from  the  same 
base  architecture.  In  the  product  line  ap¬ 
proach,  requirements  or  features  common 
to  all  the  products  are  used  as  the  basis  for 
the  so-called  core  architecture,  or  CA. 
Requirements  which  are  specific  to  some 
of  the  products  only,  but  not  all  of  them, 
are  represented  as  variation  points  in  the 
CA.  (It  is  common  to  refer  to  the  two  sets 
of  requirements  as  commonality,  or  com¬ 
monalities,  and  variability,  respectively.) 
Individual  products  are  then  developed  to 
address  the  specific  sets  of  requirements. 
In  one  approach,  individual  products  are 
directly  developed  from  the  CA  by  replac¬ 
ing  the  variation  points  with  product- 
specific  component  instances,  called  vari¬ 
ants.  This  approach  is  often  used  in  sim¬ 
pler  cases — i.e.,  when  the  number  of  indi¬ 
vidual  products  and/or  variation  points  is 
not  high. 

In  an  alternative  approach,  the  CA  is  used 
to  instantiate  a  number  of  separate  product 
architectures  or  PAs,  which  correspond  to 
individual  products.  The  PA  is  created 
from  the  CA  by  exercising  the  built-in 
variation  points.  The  actual  products  are 
then  developed  from  the  corresponding 
PAs.  This  dual  form  of  representation  of 
the  architecture  (i.e.,  CA  and  PA)  is  typi¬ 
cal  of  the  software  product  lines  [5,  6]. 

Yet  more  problems  arise  when  the  product 
line  development  path  involves  the  reuse 
of  existing  products.  In  most  cases,  exist¬ 
ing  products  and  legacy  systems  were 


built  with  little  care  (or  none  at  all)  for 
consistency  and  quality,  thus  encumbering 
the  identification  of  commonalities  and 
variability  required  for  the  product  line 
approach.  Once  identified  and  specified, 
the  CA  and  the  individual  PAs  may  differ 
significantly,  in  particular  with  regard  to 
consistency  and  prioritization  of  quality 
attributes.  Any  inconsistencies  and  differ¬ 
ences  in  the  architectures  recovered  from 
the  existing  system  must  be  resolved  in 
the  product  line  architectures — successful 
development  of  the  reengineered  system  is 
contingent  upon  the  design  of  both  CA 
and  PAs  being  quality  attribute-driven  and 
conflict-free. 

In  this  paper,  we  present  a  design-time 
technique  for  maintaining  conformance 
between  the  reengineered  and  evolving 
CA  and  individual  product  architectures. 
The  technique  is  based  on  the  concept  of 
variation  points,  which  are  exploited  in  a 
systematic  fashion  in  order  to  constrain 
the  individual  PAs  to  be  consistent  with 
the  CA.  While  the  approach  described  is 
particularly  suited  to  reengineering  prod¬ 
uct  lines,  its  generality  makes  it  also  ap¬ 
plicable  for  validation  of  product  line  ar¬ 
chitectures  developed  ‘from  scratch’  as 
well  as  those  developed  using  the  revolu¬ 
tionary  approach  [2],  The  paper  is  organ¬ 
ized  as  follows.  In  Section  4.2,  we  briefly 
describe  the  challenges  of  ensuring  quality 
conformance  between  the  CA  and  the  PAs, 
and  discuss  some  earlier  work  that  touches 
this  issue.  Section  4.3  introduces  our  tech¬ 
nique  based  on  variation  points,  together 
with  a  small  example  that  illustrates  the 
use  of  the  technique.  Finally,  Section  4.4 
summarizes  the  paper  and  highlights  some 
open  issues  for  further  work. 
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4.2  Challenges  and  Related 
Work 

As  noted  above,  the  product  line  architec¬ 
ture  consists  of  a  core  architecture  (CA) 
which  is  used  as  the  basis  for  developing  a 
number  of  individual  product  architectures 
(PAs).  The  CA  is  necessarily  underspeci¬ 
fied,  while  the  individual  PAs  must  be 
fully  specified  since  the  actual  products 
will  be  derived  from  them.  However,  the 
set  of  quality  attributes  for  a  given  PA  may 
significantly  differ  from  that  of  the  under¬ 
lying  CA,  and  even  priorities  of  different 
attributes  may  differ.  To  consider  the  in¬ 
terplay  between  the  quality  attributes  of 
individual  PAs  and  those  of  the  C  A,  we 
need  to  start  by  considering  the  CA.  The 
quality  attribute  goals  in  the  CA  are  ad¬ 
dressed  through  the  so-called  sensitivity 
and  tradeoff  points  [1,  4],  A  sensitivity 
point  is  an  area  of  the  architecture  which 
determines  the  responses  of  at  least  one 
quality  attributes.  A  tradeoff  point  is  an 
area  of  the  architecture  which  determines 
the  responses  of  two  or  more  quality  at¬ 
tributes,  usually  in  opposing  ways.  (Note 
that  each  tradeoff  point  is  a  sensitivity 
point  by  default.) 

The  problem  lies,  of  course,  in  that  the 
individual  PAs  have  quality  attributes  and 
priorities  of  their  own.  Satisfying  those 
attributes  may  cause  conflict  with  the  de¬ 
cisions  made  in  the  CA,  thus  compromis¬ 
ing  the  quality  attributes  that  should  be 
common  to  both  the  CA  and  all  PAs. 
Namely,  the  changes  that  fully  specify  an 
underspecified  CA,  and  thus  instantiate 
the  particular  PA,  are  made  in  an  area  with 
a  variation  point — the  requirement  spe¬ 
cific  to  the  PA  but  not  present  in  the  CA 
itself.  If  the  variation  point  overlaps  with  a 
sensitivity  point  of  the  original  C  A,  the 


corresponding  quality  attribute  may  be 
affected.  If  the  variation  point  overlaps 
with  a  tradeoff  point,  several  of  the  origi¬ 
nal  attributes  will  be  affected.  Now,  each 
of  the  individual  PAs  instantiates  a  par¬ 
ticular  variation  point  from  the  underlying 
CA  in  its  own  fashion.  As  a  result,  con¬ 
formance  checking  between  the  CA  and 
individual  PAs  is  a  complex  process,  and 
the  problem  is  not  made  any  easier  by  the 
fact  that  there  may  be  quite  a  few  PAs  de¬ 
rived  from  a  single  CA. 

Several  authors  have  identified  this  prob¬ 
lem  in  the  context  of  architecture  reengi¬ 
neering.  In  most  cases,  such  reengineering 
is  based  on  updating  the  ‘as  designed’  ar¬ 
chitecture  of  a  system  from  the  ‘as-built’ 
architecture  reconstructed  by  reverse  en¬ 
gineering.  Once  the  architectural  descrip¬ 
tion  of  the  existing  system  is  accurately 
specified,  it  can  be  modified  in  order  to 
fulfill  the  emergent  quality  goals  of  the 
new  target  system. 

Bengtsson  and  Bosch  present  an  iterative, 
scenario-based  reengineering  method  for 
transforming  software  architectures  to 
provide  desired  quality  attributes  re¬ 
sponses  [3], 

QADSAR  [13]  is  a  quality  attributes  sce¬ 
nario  driven  reverse  engineering  method 
for  architectures  of  existing  systems, 
whose  tool  support  is  the  ARMIN.  The 
goal  of  a  QADSAR  reconstruction  is  to 
provide  architectural  description  and  in¬ 
formation  on  architectural  drivers  to  en¬ 
able  qualitative  architectural  analysis. 

Stoermer  et  al.  [12]  provides  a  codifica¬ 
tion  of  six  practice  patterns  for  architec¬ 
tural  reverse  engineering.  These  patterns 
are  described  with  a  name,  context  of  ap- 
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plication,  concise  statement  of  problem  in 
the  context,  an  example  illustration  in  an 
industrial  context,  and  the  expected  solu¬ 
tion/delivery  from  applying  the  pattern. 
The  paper  also  describes  some  common 
approaches  to  reverse  engineering,  includ¬ 
ing  tool  supported  approaches.  The  suit¬ 
ability  of  different  approaches  (and  the 
accompanying  tools)  for  use  in  the  prac¬ 
tice  patterns  is  also  discussed.  The  result 
of  the  analysis  revealed  the  lack  of  ade¬ 
quate  coverage  for  the  practice  pattern  by 
the  existing  approaches. 

Finally,  Tahvildari  et  al.  [14]  proposed  a 
quality-driven  software  reengineering 
framework  similar  to  that  of  Bengtsson 
and  Bosch  [3].  This  framework  is  based 
on  the  use  of  desirable  target-system 
qualities  to  define  and  guide  the  reengi¬ 
neering.  According  to  the  Stoermer’s  prac¬ 
tice  pattern  catalogue  [12],  this  framework 
may  be  categorized  into  the  quality  attrib¬ 
ute  changes  practice  pattern.  In  this  pat¬ 
tern,  legacy  systems  are  reengineered  to 
improve  some  desired  quality  attributes 
responses,  such  as  performance  or  main¬ 
tainability. 

4.3  Variation  Point  Concepts 
Usage 

In  order  to  ensure  quality  congruence  be¬ 
tween  the  common  architecture  and  indi¬ 
vidual  product  architectures,  both  the  ex¬ 
isting  and  the  emerging  ones,  we  make  use 
of  the  concept  of  variation  points.  Varia¬ 
tion  points  are  architectural  placeholders 
for  augmenting  the  CA  with  behavioural 
extensions.  They  are  instantiated  as  con¬ 
crete  variants  in  individual  product  archi¬ 
tectures.  The  sensitivity  points  are  those 
architectural  decisions  that  affect  one  or 
more  quality  goals  [8],  For  example,  the 


encryption  of  sensitive  message  exchange 
between  two  components  may  improve 
the  security  quality  of  a  software-intensive 
system.  The  architectural  decision  to  in¬ 
troduce  cryptographic  components  be¬ 
tween  the  two  communicating  compo¬ 
nents  is  a  sensitivity  point  intended  to 
implement  security  insofar  as  message 
exchange  between  the  two  components  is 
concerned. 

Architectural  decisions  made  in  the  proc¬ 
ess  of  defining  the  CA,  and  subsequently 
found  to  be  sensitivity  points  to  one  or 
more  quality  attributes,  continue  to  remain 
valid  for  individual  product  architectures. 
A  possible  exception  would  be  the  case  in 
which  the  creation  of  a  PA  involves  the 
addition  of  component  variants  to  those 
parts  of  the  architecture  which  interact 
with  the  sensitivity  points.  In  the  example 
given  above,  consider  adding  a  third  com¬ 
ponent  to  periodically  receive  exception 
messages  from  both  components.  If  such 
notification  messages  to  this  third  compo¬ 
nent  are  not  similarly  encrypted,  the  secu¬ 
rity  of  the  system  may  be  jeopardized. 

An  area  of  the  architecture,  which  is  a 
sensitivity  point  and  which  contains  at 
least  one  variation  point,  will  be  referred 
to  as  an  evolvability  point.  Such  varia- 
tion/evolvability  points  deserve  special 
treatment,  as  they  have  the  potential  to 
alter  (and,  possibly,  damage)  the  quality  of 
the  architecture(s).  In  order  to  defuse  that 
potential,  each  evolvability  point  in  the 
CA  is  accompanied  by  suitable  guidelines 
to  constrain  or  guide  subsequent  PA  de¬ 
sign  decisions  and  conformance  checking. 
Thus,  the  developers  are  warned  against 
making  design  decisions  in  a  PA  that 
could  invalidate  the  quality  goals  already 
identified  in  the  CA. 
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Figure  4-1:  Example  product  line  architecture  adapted  from  [7] 

(Unshaded  boxes  represent  mandatory  components;  vertically  striped  boxes  represent  alternative  com¬ 
ponents;  shaded  boxes  represent  optional  components.) 


As  the  CA  changes,  the  evolvability  con¬ 
straints  (or  quality  attributes  conformance 
constraints)  are  updated  accordingly  to 
guide  future  design  of  the  PAs.  The 
evolvability  points  also  help  simplify 
maintenance  because  the  architects  would 
be  rightly  guided  to  those  critical  design 
decisions  that  control  quality  attribute  re¬ 
sponses. 

As  an  example,  consider  the  architecture 
shown  in  Figure  4- 1 ,  which  is  made  up  of 
three  complex  (or  composite)  components 
CC1,  CC2,  and  CC3.  Each  of  these  com¬ 
ponents  is  in  turn  made  up  of  a  number  of 
primitive  components.  In  the  product  line 
approach,  those  primitive  components  can 
be  identified  as  mandatory  (or  common), 
optional,  or  alternative.  Mandatory  com¬ 
ponents,  by  definition,  are  fully  specified 
in  the  CA  and  are  always  present  in  any 
PA.  Optional  components  are  underspeci¬ 
fied  as  variation  points  in  the  CA;  they 
can  become  fully  specified  as  components 
(or  variants)  in  a  given  PA,  or  they  will 
not  be  present  at  all.  Finally,  alternative 
components  are  underspecified  as  varia¬ 
tion  points  in  the  CA  but  must  become 


fully  specified  into  some  component  (or 
variant)  in  the  PA. 

In  the  definition  of  this  architecture,  de¬ 
sign  decisions  that  interact  with  one  or 
more  quality  attributes  (i.e.,  sensitivity 
points)  are  assumed  to  be  located  in  some 
of  the  components.  Let’s  assume  that  per¬ 
formance  and  availability  are  the  two 
quality  attribute  goals  of  the  highest  prior¬ 
ity.  We  shall  consider  two  scenarios  in 
relation  to  the  architecture  illustrated  in 
Figure  4-1:  in  the  first  scenario,  the  architec¬ 
ture  is  taken  to  be  a  product  architecture 
(PA),  while  in  the  second,  it  is  taken  to  be 
the  core  architecture  (CA). 

Scenario  1:  architecture  is  a  PA 

If  the  architecture  in  Figure  4-1  is  a  prod¬ 
uct  architecture,  then  the  shaded  and  un¬ 
shaded  boxes  are  fully  specified  architec¬ 
tural  components  (i.e.,  primitive 
components).  In  this  scenario,  we  will 
consider  two  possibilities  concerning  the 
nature  of  the  sensitivity  points. 

In  one  case,  let  the  sensitivity  points  be 
located  in  the  mandatory  components 
whose  design  decisions  are  preset  in  the 
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CA.  For  example,  the  sensitivity  interact¬ 
ing  with  performance  is  localized  in  PC23, 
while  that  of  availability  is  localized  in 
PC24.  Since  both  sensitivity  points  are 
localized  in  mandatory  components,  each 
individual  PA  inherits  those  sensitivity 
points  intact.  With  them,  performance  and 
availability  qualities  are  inherited  from  the 
CA.  As  a  result,  the  availability  and  per¬ 
formance  quality  will  always  be  met  in 
this  PA.  In  fact,  every  product  built  from 
that  CA  is  guaranteed  to  provide  the  preset 
quality  responses  for  performance  and 
availability. 

Alternatively,  one  or  both  quality  attrib¬ 
utes  may  be  localized  in  an  optional  or 
alternative  component.  Let  us  assume  that 
the  performance  quality  of  this  PA  is  de¬ 
termined  through  the  appropriate  design 
decisions  of  CC2.  Further,  assume  those 
design  decisions  are  jointly  localized  in 
components  PC24  (mandatory)  and  PC2 1 
(alternative).  The  design  decisions  of 
PC24  are  determined  during  the  CA  defi¬ 
nition,  while  those  of  PC21  are  deter¬ 
mined  in  this  particular  PA  definition.  If 
the  correct  guarantees  for  performance  are 
provided  through  PC24,  but  not  through 
PC21,  the  desired  performance  response 
may  not  be  guaranteed.  To  avoid  this,  the 
PA  must  correctly  specialize  PC21  from 
its  variation  point  definition  in  the  CA;  to 
this  end,  relevant  design  decisions  need  to 
be  guided  or  constrained  in  an  appropriate 
way,  as  described  below. 

Scenario  2:  architecture  is  the  CA 

In  this  second  scenario,  let  us  assume  that 
the  architecture  in  Figure  4- 1  is  the  CA,  in 
which  case  only  the  white  boxes  are  fully 
specified,  while  the  shaded  and  striped 
ones  correspond  to  variation  points  of  ei¬ 


ther  optional  or  alternative  type.  As  in  the 
previous  scenario,  there  are  two  possible 
cases  to  consider. 

If  all  the  sensitivity  points  in  this  architec¬ 
ture  are  located  in  mandatory  components 
(which  should  be  the  goal  of  every  prod¬ 
uct  line  design),  then  the  CA  design  deci¬ 
sions  will  address  the  common  quality  of 
all  products. 

However,  the  above  case  is  not  always 
what  is  obtained  in  reality.  Oftentimes, 
there  are  two  or  more  sensitivity  point 
localized  in  both  areas  that  has  been  fully 
are  not  fully  specified  (variation  point). 
The  architects  specified  (e.g.,  mandatory 
components)  and  areas  that  can  only  de¬ 
sign  to  fulfill  the  quality  goal  of  the  man¬ 
datory  component  and  expect  product  ar¬ 
chitects  to  fulfill  their  part  in  designing  the 
variants  for  the  appropriate  quality  re¬ 
sponse.  If  the  teams  are  different,  this  may 
be  hard  to  do  without  duplication  of  ef¬ 
forts. 

To  ensure  conformance  of  the  PA  design 
decisions  to  those  of  the  CA,  in  order  to 
fulfill  a  common  quality  goal,  an  evolva- 
bility  point  and  evolvability  constraint  pair 
are  needed.  It  is  not  every  variation  point 
in  the  CA  that  is  an  evolvability  point,  but 
only  those  that  interact  with  the  sensitivity 
point.  The  designers  of  the  CA  will  ac¬ 
company  such  evolvability  points  with 
constraints/guidelines  to  help  product  ar¬ 
chitects  in  their  work. 

Evolvability  constraint  is  a  statement 
about  an  evolvability  point  that  guides 
product  architecture  creation  in  order  to 
fulfill  desired  quality  goals.  Just  like  every 
other  form  of  constraints,  it  may  be  de¬ 
scribed  using  the  syntax  and  semantics  of 
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an  ADL  or  other  constraint  language.  The 
constraint  may  restrict  variant  components 
in  their  interaction  protocol,  internal 
states,  architectural  styles,  implementation 
or  usage  [9]  in  order  to  fulfill  some  quality 
goals. 

The  combined  use  of  evolvability  point 
and  evolvability  constraints  ensures  that 
PAs  remains  in  conformance  with  the  CA. 
The  following  is  a  description  of  an 
evolvability  point  (EP)  and  its  correspond¬ 
ing  evolvability  constraint  (EC),  as  de¬ 
fined  in  a  recent  case  study  of  a  product 
line  called  btLine,  in  the  domain  of  mobile 
and  electronic  payment  systems. 

EP:  The  response  time  of  the  btLine  prod¬ 
uct  to  tasks  delegated  to  it  is  dependent  on 
whether  it  is  interfaced  directly  to  the  leg¬ 
acy  and  back  office  systems  of  its  host 
organisation  or  not.  The  fact  that  design 
decision  on  product  integration  varies 
from  product  to  product  makes  it  an 
evolvability  point. 

EC:  To  enhance  response  time  for  transac¬ 
tion  involving  a  product,  external  data 
request  from  within  the  product  (e.g.,  bal¬ 
ance  of  a  customer  account  in  the  host 
banking  system)  must  not  involve  compli¬ 
cated  and  time-consuming  queries.  Alter¬ 
natively,  an  external  integration  mecha¬ 
nism  may  be  deployed  to  synchronize 
account  details  between  the  hank  systems 
and  their  local  btLine  product;  of  course 
with  guidance  from  the  btLine  team.  Bet¬ 
ter  still,  outbound  request  from  a  btLine 
product  to  external  systems  may  be  routed 
to  a  low-traffic  data  source  or  business 
component  for  improved  response  time. 


4.4  Conclusion  and  Open  Is¬ 
sues 

We  highlight  the  problem  context  and  the 
challenges  of  ensuring  quality  attributes 
conformance  between  a  product  line 
common  CA  and  its  product  PAs.  Subse¬ 
quently,  we  described  a  technique  for  im¬ 
plementing  this  form  of  conformance  dur¬ 
ing  product  development  and 
maintenance.  The  technique  focuses  on 
identifying  variation  points  that  interact 
with  sensitivity  points.  Those  points,  re¬ 
ferred  to  as  evolvability  points,  are  ac¬ 
companied  with  suitable  guidelines  and/or 
constraints.  The  constraints  inhibit  any  PA 
design  decisions  from  degrading  the  preset 
quality  attributes’  responses  of  the  CA. 
Adhering  to  the  constraints  and  guidelines 
would  ensure  that  the  quality  attributes  of 
the  PA  are  in  conformance  with  those  of 
the  CA. 

The  main  contributions  of  this  approach 
include  its  architecture-centric  focus  for 
reasoning  about  quality  attributes  confor¬ 
mance  of  the  product  architectures  to  the 
CA  and  systematic  use  of  variation  points 
to  constrain  product  architectures  from 
deviating  from  the  preset  qualities  of  the 
CA.  Both  of  these  should  facilitate  under¬ 
standing  of  the  interactions,  conflicts,  and 
tradeoffs  between  quality  attributes  of  dif¬ 
ferent  forms  of  architecture  encountered  in 
product  line  development. 

Much  of  the  issues  relating  to  quality  at¬ 
tributes  conformance  between  the  CA  and 
the  PAs  are  still  open.  First  and  foremost, 
considerable  advances  have  been  made 
regarding  architecture  recovery  from  ex¬ 
isting  systems — but  extraction  of  CA  and 
PAs  from  such  systems  is  still  an  open 
area  for  research. 
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Second,  there  is  need  for  characterizing 
those  areas  of  the  CA  that  do  not  feature 
any  variation  points,  but  that  have  the  po¬ 
tential  of  determining  qualities  both  in  the 
CA  and  the  PAs. 

Other  open  questions  include:  What  ap¬ 
proach  can  be  used  to  resolve  quality  at¬ 
tributes  conflicts  between  the  CA  and  PA? 
How  responsive  is  the  current  result  to 
product  line  development  in  the  evolu¬ 
tionary  approach  involving  reverse  engi¬ 
neering  or  reengineering?  What  is  the  im¬ 
pact  of  the  CA  evolving  in  terms  of 
functionality  and  quality  on  the  quality 
responses  of  the  product  architectures? 
How  can  software  product  line  specialists 
utilize  the  result  of  the  characterizations  of 
conformance  checks  between  a  product 
line’s  CA  and  PAs  for  checking  confor¬ 
mance  of  the  code-dependent  (as-built) 
architecture  to  the  documented  (as- 
designed)  PAs?  Finally,  while  tool  support 
is  always  a  plus,  the  exact  details  of  sup¬ 
port  for  quality  conformance  checking  and 
traceability  in  a  product  line  context  have 
yet  to  be  worked  out. 

Some  of  these  issues  will  be  addressed  in 
our  future  research. 
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Abstract 

In  this  position  paper  we  investigate  the 
use  of  dynamic  analysis  to  determine 
commonalities  and  variation  points  as  a 
first  step  to  the  migration  of  similar  but 
separate  versions  of  a  software  system 
into  an  integrated  product  line.  The  ap¬ 
proach  detects  forks  and  merges  in  differ¬ 
ent  execution  traces  as  an  indication  of 
variation  points.  It  is  illustrated  by  a  sim¬ 
ple  implementation,  which  is  applied  to  an 
academic  example.  Finally  we  formulate  a 
number  of  research  issues  that  need  to  be 
investigated  further. 

5.1  Introduction 

Already  many  successes  have  been  re¬ 
ported  with  respect  to  the  use  of  product 
line  approaches  in  software  development 
organizations  [1],  A  company  that  mi¬ 
grates  to  a  product  line  approach  must 
define  a  product  line  architecture  that  in¬ 
corporates  the  design  decisions  common 
to  all  product  line  members.  Additionally, 
the  variability  between  the  different  prod¬ 
uct  line  members  is  to  be  made  as  explicit 
as  possible. 

In  practice,  the  idea  of  following  a  product 
line  approach  can  be  applied  in  various 
levels  of  detail.  For  example,  one  can  de¬ 


fine  a  reference  architecture  which  speci¬ 
fies  all  commonalities  between  products 
but  does  not  make  the  variation  points 
explicit.  As  such,  we  can  distinguish  be¬ 
tween  various  maturity  levels  in  a  product 
line  deployment  [2].  This  is  also  illus¬ 
trated  in  an  industrial  example  discussed 
by  Graaf  et  al.  [3], 

A  typical  situation  in  which  the  adoption 
of  more  product  line  concepts,  and  thereby 
raising  the  maturity  level,  is  beneficial,  is 
when  a  company  has  developed  several 
versions  of  a  product  for  different  custom¬ 
ers.  All  these  versions  are  extended  in  one 
or  more  ways  with  respect  to  some  origi¬ 
nal  system  that  was  initially  developed.  At 
some  point  a  customer  comes  along  that 
requires  some  of  the  extensions  that  were 
already  implemented,  but  for  different 
versions  of  the  product,  and  thus  their  im¬ 
plementations  reside  in  different  develop¬ 
ment  branches.  As  more  versions  are  be¬ 
ing  developed,  such  a  situation  becomes 
more  and  more  likely.  At  that  point  these 
extensions  should  be  reengineered  into 
clearly  defined,  configurable  features  by 
making  variation  points  explicit,  ideally 
enabling  late  binding.  Domain  and  appli¬ 
cation  engineering  methods  have  been 
proposed  to  solve  this  problem.  Typically 
these  approaches  are  applied  in  a  context 
where  a  product  line  is  developed  from 
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scratch,  and  do  not  take  existing  source 
code  into  account.  However,  new  product 
lines  are  typically  not  developed  from 
scratch,  but  evolve  from  a  set  of  similar, 
traditionally  developed  products.  Fur¬ 
thermore,  often  many  design  decisions  are 
only  explicit  in  the  source  code.  In  this 
paper  we  consider  the  problem  of  detect¬ 
ing  forks  and  merges  in  the  execution 
traces  generated  by  different  versions  of  a 
system  so  as  to  identify  its  variation 
points.  The  remainder  of  this  paper  is  or¬ 
ganized  as  follows.  Section  5.2  discusses 
some  related  work.  In  Section  5.3  the  ba¬ 
sic  idea  of  how  execution  traces  can  help 
in  identifying  variation  points  is  pre¬ 
sented.  Section  5.4  explains  a  simple  im¬ 
plementation  of  this  idea  that  detects  forks 
and  merges  in  execution  traces.  This  im¬ 
plementation  is  applied  to  a  simple  exam¬ 
ple  in  Section  5.5.  The  paper  is  concluded 
with  some  discussions  and  directions  for 
future  work  in  Section  5.6. 

5.2  Related  Work 

Van  Gurp  et  al.  [4]  provide  an  excellent 
introduction  to  the  concepts  of  variability 
in  software  product  lines  and  discuss  how 
variability  can  be  documented  using  fea¬ 
ture  graphs.  However,  they  do  not  discuss 
in  much  detail  how  commonalities  and 
variation  points  can  be  discovered. 

Approaches  for  domain  engineering  aim  at 
identifying  commonalities  and  variabili¬ 
ties  for  the  definition  of  product  line  archi¬ 
tectures.  Scope,  variability,  and  common¬ 
ality  (SCV)  analysis  discussed  by  Coplien 
et  al.  [5]  provides  a  systematic  way  of 
thinking  about  commonality  and  variabil¬ 
ity.  The  same  work  also  introduces  FAST, 
an  approach  for  domain  engineering  based 
on  SCV-thinking.  Other  domain  engineer¬ 


ing  approaches  are  FODA  [6]  and  FORM 
[7].  Typically  these  approaches  are  based 
on  the  analysis  of  high-level  information, 
such  as  requirements  to  identify  variabili¬ 
ties  and  commonalities. 

Execution  traces  have  been  used  for  many 
purposes  in  the  program  analysis  commu¬ 
nity.  However,  in  only  a  few  cases  traces 
from  different  programs  were  compared  to 
each  other.  Much  of  the  work  is  concerned 
with  identifying  which  components  are 
required  for  a  specific  feature  or  set  of 
features. 

The  software  reconnaissance  technique 
proposed  by  Wilde  and  Scully  [8]  com¬ 
pares  execution  traces  of  different  sets  of 
scenarios  to  identify  which  components 
are  required  for  a  specific  feature. 

Eisenbarth  et  al.  [9]  apply  formal  concept 
analysis  to  execution  traces  that  each  ex¬ 
hibit  a  different  feature,  so  as  to  identify 
feature-component  relations.  As  such  they 
also  investigate  the  commonalities  and 
variabilities  between  different  features  in 
terms  of  the  components  required  to  im¬ 
plement  them. 

These  approaches  compare  different  exe¬ 
cution  traces  of  the  same  program.  There¬ 
fore,  they  rely  on  the  assumption  that  the 
exhibition  of  a  certain  feature  can  be  con¬ 
trolled  by  the  user,  which  is  not  always  the 
case. 

5.3  Tracing  and  Variation 
Points 

Suppose  we  have  two  branches  of  a  soft¬ 
ware  system,  one  being  the  base  system 
and  the  other  a  variant  with  one  or  more 
additional  features.  Detection  of  variation 
points  using  execution  traces  is  based  on 
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the  idea  illustrated  by  Figure  5-1.  In  this 
graph,  we  have  projected  one  trace  on  top 
of  the  other.  Each  node  in  the  graph  de¬ 
notes  the  usage  of  a  component  for  the 
execution  of  a  scenario.  The  arcs  indicate 
the  order  in  which  the  components  were 
used.  The  fork  in  Figure  5-1  can  be  con¬ 
sidered  the  variation  point.  All  behavior 
executed  up  to  the  split  is  common  behav¬ 
ior  and  the  components  that  are  used  after 
the  split  are  feature-specific. 

The  components  considered  in  an  execu¬ 
tion  trace  are  units  of  source  code.  Differ¬ 
ent  levels  of  granularity  are  possible: 
statement,  method,  class,  package  or  other 
abstractions. 

Execution  traces  are  obtained  by  execut¬ 
ing  some  scenario.  Comparison  of  execu¬ 
tion  traces  is  only  meaningful  when  the 
corresponding  scenarios  are  either  the 
same  or  very  similar.  In  this  context  a  sce¬ 


nario  is  defined  by  the  input  offered  to  the 
system.  We  do  not  consider  the  system’s 
response  as  part  of  the  scenario,  as  the 
intention  is  to  execute  scenarios  on  differ¬ 
ent  systems  that  yield  different  responses. 

For  the  localization  of  variation  points  in 
the  implementation  that  correspond  with 
the  specific  features,  we  need  two  execu¬ 
tion  traces:  one  in  which  the  extension  is 
exhibited  and  one  in  which  it  is  not.  De¬ 
pending  on  the  feature,  it  may  or  may  not 
be  possible  for  the  two  scenarios  that  gen¬ 
erate  these  traces  to  be  identical.  In  case 
the  exhibition  of  a  certain  feature  depends 
on  the  input,  different  scenarios  are 
needed.  This  can  be  the  case,  for  example, 
when  the  feature  is  only  activated  when  a 
user  clicks  a  certain  GUI  button.  If  activa¬ 
tion  does  not  depend  on  the  scenario,  we 
compare  execution  traces  generated  by 
various  versions  of  a  system. 


Figure  5-1:  Forks  and  merges  in  an  execution  trace 


The  underlying  assumption  in  our  ap¬ 
proach  is  that  both  execution  traces  will 
largely  resemble  each  other  and  the  asso¬ 
ciated  graphs  will  have  most  nodes  in 


common,  up  to  the  point  where  the  addi¬ 
tional  feature  is  exhibited  (Figure  5-1). 
Automatic  detection  of  such  a  fork  is  triv¬ 
ial:  we  take  the  node  before  the  first  de- 
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viation  in  the  two  execution  traces.  The 
detection  of  a  merge,  however,  is  more 
involved.  Simply  detecting  the  first  pair  of 
nodes  that  are  identical  after  the  fork 
might  not  be  meaningful.  Usage  of  a  spe¬ 
cific  component  in  both  traces  does  not 
necessarily  imply  that  the  same  behavior 
was  demonstrated  from  a  user’s  perspec¬ 
tive.  The  next  section  will  describe  a  solu¬ 
tion  to  this  issue  using  an  evolving  com¬ 
parison  window. 

The  generation  of  traces  that  can  be  com¬ 
pared  meaningfully  is  even  more  compli¬ 
cated  if  non-deterministic  behavior  is  con¬ 
sidered  (e.g.,  in  games). 

5.4  Approach 

In  this  section,  we  first  present  the  running 
example  that  is  used  in  the  remainder  of 
this  paper  to  illustrate  our  approach  for  the 
identification  of  variation  points  using 
execution  traces.  Next,  we  explain  how 
we  obtain  those  traces  and  finally  how 
they  are  processed. 

5.4.1  Running  example:  Pacman 

The  system  we  use  as  a  running  example 
in  this  paper  is  a  java-based  game  called 
Pacman.  With  a  little  imagination,  we  can 
regard  Pacman  as  a  simple  example  of  a 
software  product  line. 

Pacman  is  a  modest  software  system  con¬ 
sisting  of  20  java  classes  and  approxi¬ 
mately  1000  lines  of  code.  Like  in  a  soft¬ 
ware  product  line  there  exist  several 
variants  of  this  system,  each  with  distinct 
added  features. 

For  example,  in  the  reference  system  there 
is  one  hardcoded  map  being  loaded  when¬ 
ever  a  game  is  played.  In  another  version, 
which  can  be  considered  a  member  in  our 


product  line,  functionality  has  been  added 
(in  a  separate  class)  to  read  user-defined 
maps  from  a  file.  Yet  another  version  of 
the  system  features  an  additional  type  of 
entities  on  the  map  with  which  the  player 
and  the  monsters  can  interact. 

5.4.2  Dynamic  analysis  using  as¬ 
pects 

We  obtain  execution  traces  by  instrument¬ 
ing  the  system  with  trace  statements.  We 
add  these  trace  statements  by  means  of 
aspect-oriented  programming.  Aspect- 
oriented  programming  is  extremely  suit¬ 
able  for  implementing  a  crosscutting  con¬ 
cern  such  as  tracing  since  it  allows  us  to 
add  code  at  various  program  locations 
with  limited  effort.  We  use  AspectJ  to 
weave  additional  code  in  the  system  such 
that,  whenever  a  method  is  called,  a  mes¬ 
sage  is  printed  to  a  log  file.  This  message 
contains  both  the  method  being  called  and 
the  class  to  which  this  method  belongs. 

Now,  we  can  generate  traces  containing 
the  methods  called  during  execution.  De¬ 
pending  on  the  desired  level  of  granularity 
of  variation  point  detection,  we  may  need 
to  further  process  this  trace,  e.g.  to  gener¬ 
ate  a  trace  on  the  class  level. 

Alternatively,  one  could  use  the  Java  Vir¬ 
tual  Machine  Profiler  Interface  (JVMPI) 
to  collect  traces  from  a  system,  as  is  done, 
for  example,  by  Reiss  and  Renieris  [10]. 

5.4.3  Determining  variation  points 

When  dealing  with  software  product  lines, 
each  of  the  product  line  members  gener¬ 
ally  contains  a  set  of  features.  Typically, 
the  members  have  some  of  these  features 
in  common  whereas  others  are  product- 
specific.  If  an  architect  is  to  combine  two 
or  more  products,  the  components  respon- 
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sible  for  the  latter  type  of  features — the 
variation  points — must  be  determined. 

We  propose  a  method  in  which  we  com¬ 
pare  the  traces  generated  by  two  versions 
of  a  similar  system  to  discover  variation 
points.  On  the  one  hand  we  have  a  trace 
generated  by  the  reference  system,  called 
the  reference  trace,  and  on  the  other  hand 
a  trace  generated  by  an  extended  version, 
called  the  feature  trace.  These  traces  are 
to  be  obtained  by  running  both  systems 
using  similar  scenarios:  ideally,  the  latter 
differs  from  the  former  only  in  that  the 
specific  extension  is  exhibited. 

As  mentioned  in  Section  5.3,  branches  are 
not  necessarily  considered  merged  as  soon 
as  the  two  traces  once  again  have  one 
method  in  common.  For  this  reason,  we 
require  the  traces  to  have  multiple  con¬ 
secutive  methods  in  common. 

The  algorithm  being  applied  reads  as  fol¬ 
lows: 

1.  Compare  the  traces  of  the  reference 
system  and  the  product  line  member 
line  by  line. 

2.  If  the  two  methods  at  hand  differ,  the 
traces  have  split  into  branches.  Cre¬ 
ate  an  N- size  checksum  of  the  current 
reference  method  and  the  next  N-l 
methods  (henceforth,  we  will  call  this 
the  reference  window). 

3.  Next,  create  a  checksum  of  the  up¬ 
coming  N  methods  in  the  feature 
trace,  thus  creating  the  feature  win¬ 
dow. 

4.  If  the  checksums  are  equal,  the 
branches  are  considered  to  have 
merged.  If  they  do  not  match,  shift 
the  feature  window  down  one 
method,  thus  creating  a  new  feature 
checksum.  Repeat  this  step  a  maxi¬ 
mum  of  M  times. 


5.  If  the  checksums  still  do  not  match, 
shift  the  reference  window  down  one 
method.  Repeat  the  previous  step,  but 
repeat  the  current  step  a  maximum  of 
M  times. 

6.  If  there  is  still  no  match,  either  M  is 
too  small  or  the  branches  never 
merge,  i.e.  the  systems  never  again 
exhibit  the  same  behavior  at  the 
method  level. 

The  values  for  N  and  M  are  variable  and 
depend  on  several  factors.  In  assigning 
suitable  values  to  these  variables,  impor¬ 
tant  factors  include  the  size  of  the  system 
and  the  predicted  impact  (in  terms  of  the 
amount  of  associated  method  calls)  of  the 
feature  at  hand.  We  expect  the  architect  to 
have  sufficient  knowledge  of  the  system  at 
hand  to  choose  appropriate  values  for  M 
and  N. 

The  branching  behavior  derived  by  the 
algorithm  can  be  visualized  by  presenting 
contexts  (of  predefined  sizes)  of  all  fork¬ 
ing  and  merging  points  in  the  traces  to  the 
user.  By  visualizing  and  inspecting  the 
branching  behavior,  the  architect  has  a 
way  of  identifying  which  methods  and 
classes  account  for  member-specific  fea¬ 
tures.  Having  approximated  these  varia¬ 
tion  points,  it  takes  much  less  effort  to 
merge  the  two  versions  than  if  the  entire 
systems  had  required  close  inspection. 

5.5  Preliminary  Results 

To  illustrate  the  method  presented  in  the 
previous  section  we  have  conducted  some 
experiments  on  the  Pacman  system  de¬ 
scribed  earlier. 

In  this  section,  we  will  highlight  the  ex¬ 
periment  involving  Pacman’s  reference 
system  and  the  modified  version  featuring 
separate  map  handling. 
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(^6:  Game. initialize^) 


(^j2Game  HetWoildMapj)  (^NlapFactoiV'init1 
(^8:  Game .  loadWoilcO  Cj  MapFactoiv.readMapiFiomFile 

TzcT 

(^9~  Gameemptv^)  (^9^NfapFactorv.copy^) 


Figure  5-2:  A  fork  and  its  context  in  the  trace. 


5.5.1  Generating  traces 

Choosing  appropriate  scenarios  is  rela¬ 
tively  easy  in  this  case,  as  loading  maps  is 
part  of  the  initialization  phase  and  there¬ 
fore  not  subject  to  human  intervention.  It 
is  simply  a  matter  of  running  both  pro¬ 
grams  and  exiting  without  actually  having 
played  game. 


Part  of  a  method  trace  as  generated  by  use 
of  the  aspect  mentioned  in  Section  5.2  is 
depicted  in  Listing  5-1. 

Incorporation  of  the  actual  stack  depths  is 
not  part  of  the  results  discussed  here  and 
is  subject  to  future  research. 
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Pacman.main 
Pacman. cinit > 
Engine. cinit > 
Game . cinit > 

Game . empty 
Game . empty 
Game . cinit > 

Game . initial ize 


Listing  5-1:  Part  of  a  method  trace 

5.5.2  Branching  behavior 

We  are  now  ready  to  compare  the  traces 
by  using  the  algorithm  described  in  Sec¬ 
tion  5.4.3.  However,  we  need  to  define 
some  parameters  first. 

Since  we  are  considering  a  small  system 
and  a  not  so  complicated  feature,  we  do 
not  expect  branches  to  be  very  long,  e.g. 
perhaps  tens  of  methods  at  most.  For  the 
same  reason  we  will  set  the  checksum  size 
at  a  relatively  small  value,  e.g.,  5  methods. 
Finally,  the  size  of  the  context  being  pre¬ 
sented  to  the  user  is  set  to  7. 

The  results  can  be  viewed  in  Figure  5-2 
and  Figure  5-3.  Figure  5-2  depicts  the 
context  of  the  point  where  the  feature 
trace  started  deviating  from  the  reference 


trace.  One  can  easily  see  that  whereas  in 
the  original  version  a  local  method  is  in¬ 
voked  to  get  a  map,  the  other  version  in¬ 
stantiates  a  whole  new  class  that  deals 
with  the  map  handling. 

Figure  5-3  illustrates  that  not  many  meth¬ 
ods  calls  later,  the  branches  have  merged. 
From  here  on,  the  traces  are  apparently 
similar. 

Judging  by  the  visualizations — if  pre¬ 
sented  at  the  correct  part  and,  if  desired, 
migrate  the  components  associated  ab¬ 
straction  level — an  architect  can  easily 
isolate  the  feature  specific  with  this  varia¬ 
tion  point  towards  other  existing  product 
line  members. 
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Figure  5-3:  A  merge  and  its  context  in  the  trace. 


5.6  Discussion  and  Future 
Work 

Effort.  To  repeat  our  experiment  on  a  dif¬ 
ferent  subject  system,  one  can  apply  the 
following  process: 

1 .  Perform  a  quick  ( 1  -hour)  exploration 
of  the  system  to  gain  some  insight  in 
its  structure.  This  provides  an  initial 
estimate  for  the  values  of  the  M  and 
N  parameters. 

2.  Determine  appropriate  scenario(s) 
that  exercise  the  desired  features. 

3.  Add  tracing  instrumentation  to  the 
system,  e.g.  by  weaving  aspects. 


4.  Collect  execution  traces  for  given 
scenarios  and  (automatically)  com¬ 
pare  them  to  find  variation  points. 

5.  If  desired,  repeat  step  4  using  alterna¬ 
tive  values  for  M  and  N  to  fine-tune 
the  results. 

Precision.  In  the  current  implementation 
we  more  or  less  assume  that  a  merge  point 
is  not  located  arbitrarily  far  from  a  fork. 
Hence,  we  introduced  the  M- parameter  in 
our  detection  algorithm.  This  assumption 
is  valid  because  we  require  that  one  ver¬ 
sion  is  a  strict  extension  over  the  other. 

If  we  abandon  this  requirement,  we  would 
have  to  search  both  execution  traces  all 
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the  way  to  the  end  to  find  potential 
merges.  The  complexity  of  this  search  is 
0(n2),  which  could  be  problematic  for  sys¬ 
tems  of  realistic  size,  involving  traces 
consisting  of  millions  of  components.  This 
is  why  we  advocate  a  sensible  value  for 
M:  a  value  defined  by  the  architect,  based 
on  how  much  impact  he  expects  the  par¬ 
ticular  feature  to  have  on  the  given  trace 
granularity  level  (method  level  in  the  case 
presented  here).  In  the  future  we  may  be 
able  to  automatically  determine  optimal 
values  for  specific  systems. 

Future  work.  To  render  identification  of 
variation  points  feasible  in  the  case  of 
complex  systems,  we  need  more  refined 
techniques.  One  approach  is  to  take  into 
account  not  only  the  methods  being  called 
but  also  their  actual  parameters.  This 
would  require  a  straightforward  extension 
of  the  tracing  instrumentation. 

Another  option  is  to  also  look  at  the  stack 
depth  or  maybe  even  the  complete  stack 
whenever  a  method  is  called.  Both  these 
extensions  to  our  technique  potentially 
allow  for  the  detection  of  extra  forks,  and 
increase  the  probability  that  an  identical 
entry  in  the  two  call  traces  indeed  implies 
that  the  two  versions  were  again  exhibit¬ 
ing  common  behavior,  from  a  user’s  per¬ 
spective.  Probably  this  also  means  that  the 
/V-parametcr  can  be  smaller,  which  in  turn 
reduces  the  cost  of  the  checksum  calcula¬ 
tions. 

An  alternative  approach  in  dealing  with 
systems  of  realistic  size  would  be  to  not 
directly  analyze  the  method  trace,  but  to 
first  lift  its  elements  to  higher  levels  of 
abstraction,  e.g.,  from  methods  to  classes 
or  packages.  To  this  end,  we  would  first 
have  to  extract  information  with  respect  to 


the  structural  decomposition  of  the  sys¬ 
tem. 

Finally,  once  a  feature  is  localized  a  next 
step  is  to  modularize  the  code  that  imple¬ 
ments  it.  To  provide  guidelines  for  this 
step  we  will  investigate  whether  the  num¬ 
ber  of  times  two  traces  intersect  (in  terms 
of  identical  methods  being  called)  before 
the  same  behavior  is  exhibited  (as  defined 
by  the  /V-parametcr)  could  be  a  measure 
for  the  degree  of  ‘crosscuttingness’  of  a 
feature,  and  hence  for  the  effort  required 
to  reengineer  such  a  feature  into  a  reusable 
component. 
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Abstract 

Domain-specific  reuse  is  seen  as  a  promis¬ 
ing  way  to  increase  the  value  of  reuse. 

This  paper  reports  our  ongoing  work 
aimed  at  identifying  domain-specific 
software  components  from  an  existing 
system  to  achieve  large-scale  reuse.  The 
fundamental  motivation  of  the  proposed 
method  is  to  reduce  the  amount  of  source 
code  the  human-expert  has  to  explore  in 
order  to  identify  domain-specific  candi¬ 
dates.  The  basic  premise  assumed  by  this 
method  is  that  reusable  components  have 
certain  quality  attributes  like  functional 
usefulness,  readability,  testability,  etc.,  and 
which  can  be  measured  to  certain  extent 
with  help  of  metrics. 

Keywords:  domain  engineering,  reuse, 
reverse  engineering,  metrics,  software 
product  lines 


6.1  Introduction 

Software  reuse  is  considered  as  a  promis¬ 
ing  way  of  developing  systems.  It  helps  an 
organization  to  improve  their  productivity 
and  the  quality.  Software  reuse  can  be  ap¬ 
plied  to  any  life  cycle  product,  not  only  to 
source  code.  Jones  [10]  identifies  ten  po¬ 
tentially  reusable  aspects  of  software  pro¬ 
jects  as  shown  in  Table  6-1.  (Ordering  of 
aspects  in  Table  6-2  is  not  with  respect  to 
priority.) 


However,  in  practice,  granularity  of  reuse 
is  small.  That  is,  very  often,  utility  librar¬ 
ies  (for  e.g.,  string,  math  libraries)  are  re¬ 
used  across  products.  Value  of  such  a  re- 


1.  architecture 

6.  estimates 

2.  source  code 

7.  human  interfaces 

3.  data 

8.  plans 

4.  design 

9.  requirements 

5.  documentation 

10.  test  cases 

Table  6-1:  Reusable  Aspects  of  Software 
Projects 


1  This  work  is  partially  funded  by  German  ministry  under  EUREKA  2023/ITEA-ip00009  ’FAct 
based  Maturity  through  Institutionalization  Lessons-learned  an  Involved  Exploitation  of 
System-  family  engineering’  (FAMILIES). 
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use  is  quite  limited  [8].  In  a  mature  do¬ 
main,  most  of  the  required  solutions  al¬ 
ready  exist  in  current  implementations.  It 
has  been  argued  in  [13]  that  there  are  three 
categories  of  software  that  make  up  a  sys¬ 
tem: 

•  Utility  components  contribute  to  20% 
of  whole  application  size. 

•  Domain-specific  components  con¬ 
tribute  to  65%  of  whole  application 
size. 

•  Application-specific  components 

contribute  to  15%  of  whole  applica¬ 
tion  size. 

The  distribution  shows  that  reuse  of  do¬ 
main-specific  components  from  an  exist¬ 
ing  system  has  the  most  potential  in  reduc¬ 
ing  the  development  cost  and  maintenance 
effort  [1].  The  identification  of  domain- 
specific  components  is  not  an  obvious  task 
since  systems  are  typically  developed  for 
a  single  customer.  Designers  and  engi¬ 
neers  thereby  do  not  distinguish  between 
domain-specific  and  application-specific 
components  [9]  as  it  is  explicitly  done  in 
product  line  engineering.  So  these  compo¬ 
nents  are  not  organized  separately.  Hence, 
expert  effort  has  to  be  spent  to  make  these 
components  apparent. 

We  believe  that  reverse  engineering  can 
help  to  identify  domain-specific  compo¬ 
nents  and  therefore  to  support  the  reuse 
activities  by  reducing  the  expert  effort  in 


searching  for  component  candidates, 
which  is  a  problem  especially  for  large 
systems.  Our  approach  helps  experts  to 
semi-automatically  identify  domain- 
specific  components.  From  here  onwards, 
we  limit  our  discussions  only  to  object- 
oriented  (OO)  systems.  And  consequently, 
the  term  component  refers  to  the  collec¬ 
tion  of  functionally  related  classes  with 
specification  of  required  and  provided 
interfaces. 

The  fundamental  motivation  of  the  pro¬ 
posed  approach  is  to  reduce  the  amount  of 
data  the  human  expert  has  to  review  in 
order  to  identify  domain-specific  classes. 
The  basic  premise  assumed  by  this 
method  is  that  reusable  classes  have  cer¬ 
tain  quality  attributes  like  functional  use¬ 
fulness,  readability,  testability,  etc.  These 
quality  attributes  are  mapped  on  metrics 
(e.g.,  by  using  the  GQM  method).  The 
method  classifies  the  domain-specific 
classes  based  on  the  metrics  derived.  The 
expert  has  to  validate  only  a  few  number 
of  proposed  candidates,  which,  if  ac¬ 
cepted,  become  then  the  foundation  of 
reusable  components  (see  Figure  6-1). 

The  remainder  of  the  paper  is  organized  as 
follows:  Section  6-2  explains  the  factors 
affecting  reusability.  Section  6-3  presents 
component  extraction  method.  Section  6-4 
presents  the  related  work,  while  Section  6- 
5  concludes  this  work. 
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Figure  6-1:  Process  model  for  component  extraction 


6.2  Approach 

6.2.1  Terminology 

Forward  Engineer:  An  engineer  who 
wants  to  reuse  existing  components  of  the 
same  or  similar  domains  to  reduce  devel¬ 
opment  effort. 

Domain  Expert:  A  specialist  with  de¬ 
tailed  knowledge  of  the  domain  who  is 
also  familiar  with  the  architecture  of  the 
system  where  the  existing  components 
reside. 

Reverse  Engineer:  A  person  having  un¬ 
derstanding  of  00  metrics  and  being  ca¬ 
pable  to  analyze  an  existing  system  (no 
need  to  have  expertise  in  the  domain). 


6.2.2  Factors  Affecting  Reusability 

Figure  6-2  shows  a  “fishbone  diagram” 
that  represents  the  factors  affecting  reus¬ 
ability.  It  can  be  observed  from  this  figure 
that  reusability  depends  on  Usefulness, 
Costs  and  Quality.  Each  of  these  factors  is 
explained  below. 

Usefulness 

To  be  reused,  a  prerequisite  is  that  the 
component  implements  functionality  that 
is  useful  for  the  new  system.  It  is  ex¬ 
tremely  hard  to  decide  in  an  automated 
way  whether  or  not  a  component  will  be 
useful  in  a  new  system,  since  this  decision 
is  based  on  domain  knowledge  and  the 
requirements  of  the  new  system.  However, 
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an  indirect  automatable  measure  of  use¬ 
fulness  was  developed  to  measure  the  re¬ 
usability  of  the  existing  component  within 
the  analyzed  system  itself  (i.e.,  its  origin). 
The  assumption  is  that  the  highly  used 
components  within  a  system  are  a  good 
candidate  for  reuse  in  a  new  context. 


There  is  also  a  limitation  because  of  our 
assumption:  We  tend  to  exclude  those  do¬ 
main-specific  components  that  are  not 
frequently  used  in  the  existing  system.  It 
is  important  to  note  that  the  domain  expert 
is  crucial  to  decide  about  the  usefulness  of 
a  component  candidate. 


Usefulness 


Cost 

Reuse  cost  includes  cost  of  identifying  a 
component  from  the  existing  system, 
modifying  and  integrating  them  into  a  new 
system.  Measures  of  size  and  complexity 
of  a  component  provide  a  partial  indica¬ 
tion  of  difficulty  in  adapting  it  to  reuse  in 
a  new  system.  The  cost  to  reuse  the  com¬ 
ponent  is  influenced  by  the  readability  of 
its  code,  a  characteristics  that  can  again  be 
partially  evaluated  using  size  and  com¬ 
plexity  measures.  That  is,  small  and  sim¬ 
ple  code  fragments  are  usually  easier  to 


read  and  adapt  than  larger  and  complex 
fragments. 

Quality 

The  quality  of  the  component  is  important 
in  order  to  succeed  in  reuse-driven  devel¬ 
opment.  Several  qualities  that  are  impor¬ 
tant  for  component  reuse  are  correctness, 
readability,  testability,  ease  of  modifica¬ 
tion,  and  performance,  but  most  of  them 
are  not  directly  measurable.  Measures  of 
size  and  complexity  of  a  component  how- 
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ever  provide  a  partial  indication  of  the 
presence  of  these  qualities  in  it. 

6.2.3  Metrics  for  Measuring 
Costs,  Usefulness  and 
Quality 

Table  6-2  contains  definitions  of  the  met¬ 
rics  used  for  measuring  costs,  usefulness, 
and  quality.  A  complete  discussion  about 
these  metrics  can  be  found  in  [5].  Our  mo¬ 
tivation  is  to  come  up  with  a  reusability 
model,  which  contains  metrics  and  the 
suitable  upper  and  lower  bounds  to  sup¬ 
port  identify  reusable  classes. 


Metric 

Definition 

NMPUB 

The  number  of  public  methods  imple¬ 
mented  by  a  class. 

WMC 

Cyclomatic  complexity  of  a  class. 

NOC 

The  number  of  children/grandchildren  of  a 
class. 

DIT 

The  level  a  class  is  located  from  the  root  in 
the  inheritance  hierarchy. 

OCAEC 

The  number  of  times  a  class  has  been 
used  as  an  attribute  in  other  classes. 

CALLS  JN 

The  total  number  of  times  the  methods  of  a 
class  was  called  by  other  classes. 

LCOM 

Cohesion — The  number  of  sets  of  methods 
that  access  the  same  attributes. 

Table  6-2:  Definition  of  Metrics 


Measuring  Usefulness 

We  measured  the  functional  usefulness 
using  the  assumption:  a  class  that  is  used 
frequently  within  a  system  is  a  good  can¬ 
didate  for  reuse  in  a  new  system  in  similar 
domain.  In  00  systems,  a  class  A  can  use 
another  class  B  in  the  following  ways: 

•  Methods  of  A  invokes  the  methods  of 
B 

•  A  contains  an  instance  of  B  as  its  at¬ 
tribute 

•  A  inherits  from  B 

•  A  can  read/write  attributes  of  B 


Hence,  we  have  chosen  the  metrics 
namely  NOC,  OCAEC,  CALLIN  and 
DIT  to  measure  the  usefulness  within  the 
existing  system  itself. 

If  a  class  has  many  children/ 
grandchildren,  then  it  is  likely  that  it  im¬ 
plements  certain  generic  functionality. 
Hence  we  need  to  take  only  the  lower 
bound  for  NOC  because  the  more  the 
number  of  children/grandchildren,  the 
higher  is  its  assumed  genericity. 

The  reason  for  choosing  DIT  metric  is  that 
if  a  class  occurs  near  the  leaf  of  the  inheri¬ 
tance  tree  then,  in  most  cases,  it  imple¬ 
ments  probably  certain  specialized  func¬ 
tionality.  For  reuse  candidate’s 
identification,  specialized  functionality  is 
obviously  not  the  first  priority.  That  is,  we 
should  not  go  too  deep  in  the  inheritance 
hierarchy.  Hence  we  need  to  take  only  the 
upper  bound  for  DIT  metric. 

In  many  applications,  classes  are  not  al¬ 
ways  in  the  inheritance  tree.  That  is,  there 
are  classes  that  don’t  have  either  a  parent 
or  children  and  such  classes  might  also  be 
good  candidates  for  reuse.  In  order  to  in¬ 
clude  such  classes  for  potential  reuse,  we 
have  chosen  CALL  IN  and  OCAEC. 
CALL  IN  metric  is  used  to  identify  those 
classes  that  are  used  heavily  by  methods 
of  other  classes.  The  more  the  value  of 
CALL_IN,  the  higher  is  its  usefulness 
within  the  system.  If  the  value  of 
CALL  IN  is  below  a  certain  value,  it  is 
likely  that  its  services  are  not  important 
that  to  the  system.  Hence  we  need  to  take 
only  the  lower  bound  for  CALL  IN. 
OCAEC  can  be  used  to  measure  how  use¬ 
ful  the  class  is  in  building  the  other 
classes.  That  is,  if  a  class  has  higher 
OCAEC  then  it  is  used  an  attribute  in 
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many  other  classes.  Similar  to  CALLIN, 
we  need  to  select  only  the  lower  bounds 
for  OCAEC. 

Measuring  Cost 

We  can  measure  the  reuse  costs  using  the 
metrics  NMPUB  and  WMC.  If  NMPUB 
and  WMC  are  high  then  it  might  take 
more  effort  to  understand,  modify  and  in¬ 
tegrate  them  into  a  new  system.  On  the 
other  hand,  if  both  NMPUB  and  WMC  are 
too  low,  it  is  very  likely  that  there  is  noth¬ 
ing  interesting  in  it.  So  it  is  better  not  to 
exceed  both  the  bounds. 


Measuring  Quality 

We  can  measure  quality  using  NMPUB, 
WMC  and  LCOM.  If  a  class  has  high 
NMPUB  then  it  is  likely  to  have  impact 
on  correctness,  readability.  Testability  can 
be  partially  predicted  with  help  of  WMC 
[3].  Higher  the  WMC,  lower  is  the  test¬ 
ability  of  a  class.  If  cohesion  metric 
LCOM  is  high,  it  is  very  likely  to  reduce 
the  understandability  and  readability  of 
the  class  because  of  the  variety  of  func¬ 
tionality  implemented  in  it.  Hence,  only 
the  upper  bound  is  necessary  for  LCOM. 


Figure  6-3:  Associating  00  metrics  with  factors  affecting  reusability 
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6.3  Process 

We  introduce  an  approach  for  the  extrac¬ 
tion  of  domain-specific  components  from 
an  existing  system,  where  reverse  engi¬ 
neers,  forward  engineers,  and  the  domain 
experts  work  closely  together.  Figure  6-4 
depicts  the  1 0  steps  of  our  approach  in  the 
pseudo-code  format. 

Step  1:  Goal  Description:  In  this  step,  the 
forward  engineer  formulates  the  goal  and 
explains  it  to  the  reverse  engineer.  The 
forward  engineer  can  describe  the  kind  of 
components  he  wants  to  reuse  from  the 
existing  system.  For  instance,  let  us  as¬ 
sume  that  the  forward  engineer  wants  to 
build  an  IDE  for  modeling  software  archi¬ 
tectures  by  reusing  an  existing  IDE  for 
Java.  The  forward  engineer  then  explains 
the  need  for  components  implementing 
functionality  related  to  workspace,  pro¬ 
jects,  package  hierarchy,  and  file  man¬ 
agement. 

Step  1:  The  forward  engineer  describes  the 
goal  to  the  reverse  engineer. 

Step  2:  The  reverse  engineer  sets  up  the  fact 
base. 

Step  3:  The  reverse  engineer  selects  metrics 
and  choose  its  bounds. 

Step  4:  The  reverse  engineer  identifies  candi¬ 
date  classes  which  satisfied  the  criteria  de¬ 
fined  in  step  3. 

Step  5:  A  “lightweight”  review  on  the  classes 
from  step  4  is  done  by  reverse  engineer.  If  he 
is  not  satisfied  then  he  goes  back  to  step  3. 
Otherwise,  he  passes  the  candidates  to  step  6. 

Step  6:  The  domain-expert  reviews  the  candi¬ 
dates  and  classifies  them. 

Step  7:  The  reverse  engineer  analyses  the 
classification  made  by  the  domain-expert. 

Step  8:  Both  the  engineers  start  building  com¬ 
ponents  from  the  key  classes  of  step  6. 

Step  9:  Interface  analysis  is  done  by  the  re¬ 


verse  engineer  to  know  the  dependency  be¬ 
tween  the  components  from  step  8  and  the 
rest  of  the  existing  system. 

Step  10:  The  forward  engineer  makes  the  final 
decision  about  the  reuse  of  the  components 
using  the  output  of  step  9. 

Figure  6-4:  Different  steps  for  component  ex¬ 
traction 

Step  2:  Setting  up  the  fact  base:  The  re¬ 
verse  engineer  parses  the  source  code  of 
an  existing  system  and  builds  an  initial 
model  of  source  code.  The  initial  model 
could  be,  for  example,  an  RSF  representa¬ 
tion  of  the  source  code.  In  addition,  for 
each  class  in  the  source  model,  he  com¬ 
putes  the  metrics  defined  in  Table  6-2. 

Step  3:  Select  metrics  and  choose  its 
bounds:  In  this  step,  the  reverse  engineer 
chooses  bounds  for  the  metrics  defined  in 
Table  6-2.  But  the  problem  is  a  lack  of  an 
analytical  method  that  a  reverse  engineer 
can  use  to  choose  the  bounds  for  these 
metrics.  In  the  first  iteration,  in  order  to 
choose  bound(s)  for  a  metric,  he  computes 
the  average  of  the  metric  values.  This 
seems  to  be  like  a  trial  and  error  but  it  is 
nevertheless  a  meaningful  starting  point. 


Metric 

Minimum 

Maximum 

NMPUB 

X 

X 

WMC 

X 

X 

LCOM 

X 

CALLSJN 

X 

DIT 

X 

NOC 

X 

OCAEC 

X 

Table  6-3:  Metrics  with  Lower  and  Upper 
Bound 

Step  4:  Identify  candidates:  In  this  step, 
the  reverse  engineer  applies  the  metric 
criteria  developed  in  step  3  to  all  the 
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classes  in  the  fact  base.  Classes  which  sat¬ 
isfied  the  criteria  will  be  passed  to  the  step 
5. 

Step  5:  Lightweight  review:  One  of  the 
major  problems  is  that  reverse  engineers 
usually  don’t  have  expertise  in  the  appli¬ 
cation  domain.  Therefore  he  cannot  re¬ 
view  the  candidates  identified  in  step  4  for 
its  usefulness  in  a  new  system.  But  the 
reverse  engineer  can  do  a  lightweight  re¬ 
view  to  help  the  forward  engineer.  For 
example,  if  the  number  of  candidates  iden¬ 
tified  in  step  4  is  too  high  then  he  rede¬ 
fines  the  criteria  defined  in  step  3.  The 
reverse  engineer  also  uses  the  goal  de¬ 
scription  during  the  light-weight  review  of 
the  identified  candidates. 

Step  6:  Review  by  the  domain  expert:  In 

this  step,  the  domain  expert  reviews  the 
classes  identified  in  step  5  (based  on  the 
goal  description  of  step  1).  The  main  focus 
of  the  domain-expert  in  this  review  is  to 
decide  about  the  functional  usefulness  of 
the  candidates.  Domain  expert  classify 
each  of  the  classes  given  by  reverse  engi¬ 
neer  as  follows: 

•  Utility  -  Classes  which  implement 
general  utility  (for  example,  math  rou¬ 
tines). 

•  Application- specific  -  Those  classes 
that  implement  functionality  specific 
to  single  instance  of  a  product  line. 

•  Domain-specific  -  Those  classes  that 
contain  generic  functionality  needed 
for  all  instances  of  a  product  line. 

Step  7:  Analyze  classification:  It  is  im¬ 
portant  to  keep  in  mind  that  domain- 
experts  are  usually  busy.  Therefore,  the 
reverse  engineer  must  minimize  the 
amount  of  the  candidates  the  expert  has  to 


review  but  at  the  same  time  maximize  the 
domain-specific  candidates.  To  achieve 
this  goal,  the  reverse  engineer  has  to  ana¬ 
lyze  the  classification  of  the  candidates  by 
the  domain-expert.  In  order  to  provide  the 
domain-expert  with  many  domain-specific 
applies  a  filtering  strategy >.  Filtering  is 
used  to  reduce  the  search-space  for  do¬ 
main-specific  classes.  That  is,  certain 
classes  that  are  most  likely  not  to  be  do¬ 
main-specific,  are  ignored: 

•  If  the  root  of  inheritance  tree  is  not 
domain-specific  then  it  is  likely  that 
the  complete  inheritance  tree  is  not 
domain-specific.  So,  we  can  filter  all 
the  classes  involved  in  such  trees. 
However,  this  strategy  needs  to  be  ap¬ 
plied  carefully:  For  example,  in  Java, 
the  class  “Object”  is  the  root  class  of 
all  classes,  but  we  can  develop  new 
applications  based  on  the  existing 
Java  classes. 

•  If  a  class  C  is  an  application-specific/ 
utility  class,  then  all  the  classes  that 
are  dominated  by  C  are  likely  to  be 
application/utility  class.  Domination  is 
defined  using  the  dominance  tree 
where  the  nodes  are  classes  and  the 
edge  is  the  call  relation  between  the 
classes.  Note  that  this  assumption  is 
not  always  true;  there  could  cases 
where  the  application  class  uses  a  do¬ 
main  class.  Nevertheless,  we  try  to  re¬ 
duce  the  search-space  by  making  such 
kind  of  assumptions. 

•  If  a  class  C,  which  satisfied  the 
bounds  of  the  metric  OCAEC,  is  ap¬ 
plication-specific/utility  then  all  the 
classes  that  are  used  as  attributes 
within  the  class  C  are  likely  to  be  ap¬ 
plication  specific. 
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•  One  obvious  filtering  strategy  is  filter¬ 
ing  those  classes  which  were  already 
reviewed  by  the  expert.  Before  he  ap¬ 
plies  the  criteria  defined  in  step  3, 
these  reviewed  classes  can  be  filtered 
out. 

Step  8:  Component  building:  In  this  step 
both  forward  and  reverse  engineer  works 
together  to  build  components  from  the  key 
classes  that  are  identified  by  step  6.  From 
the  key  classes,  all  the  required  dependen¬ 
cies  have  to  be  extracted  so  that  compo¬ 
nents  can  be  built.  This  requires  analyses 
of  interfaces  of  the  key  classes. 

Step  9:  Interface  Analysis:  In  this  step, 
the  reverse  engineer  analyzes  the  depend¬ 
ency  between  the  components  from  step  8 
and  the  rest  of  the  system  [1 1].  By  using 
the  factbase,  interface  analysis  identifies 
all  the  required  interfaces  that  are  neces¬ 
sary  for  the  execution  of  a  component  in  a 
new  system. 

Step  10:  Final  decision  and  code  analy¬ 
sis:  In  this  step,  using  the  output  of  the 
interface  analysis,  the  forward  engineer 
decides  whether  to  reuse  the  component  or 
not.  His  decision  is  influenced  as  well  by 
factors  such  as  performance. 

6.4  Related  Work 

Basili  and  Rombach  [2]  describe  a  com¬ 
prehensive  framework  of  models,  model- 
based  characterization  schemes,  and  sup¬ 
port  mechanisms  for  better  understanding, 
evaluating,  planning  and  supporting  all 
aspects  of  reuse.  We  follow  their  reuse- 
oriented  software  environment  model  to 
set  up  a  component  repository  for  product 
line  migration. 


Caldiera  and  Basili  [4]  describe  Care  that 
helps  identifying  reusable  component  us¬ 
ing  a  user-defined  “reusability  attribute 
model”  based  on  software  metrics.  We 
customized  this  approach  to  object- 
oriented  paradigm  to  support  the  product 
line  migration  activities  in  the  presence  of 
existing  systems. 

Dunn  and  Knight  [6]  describe  a  model 
based  on  an  expert-system  for  the  identifi¬ 
cation  of  reusable  components  from  exist¬ 
ing  systems.  Suitability  of  this  expert- 
system  to  object-oriented  paradigm  needs 
further  research. 

Diaz  and  Freeman  [12]  describe  a  scheme 
to  classify  software  for  reusability.  Their 
premise  is  that  reuse  can  happen  only 
when  there  is  an  automatic  way  of  retriev¬ 
ing  the  required  software  components 
from  the  repository.  Introducing  such  a 
classification  scheme  is  a  part  of  our  fu¬ 
ture  work. 

Etzkom  and  Davis  [7]  describe  an  ap¬ 
proach  for  automatically  identifying  reus¬ 
able  classes  from  object-oriented  system. 
Their  PATRicia  system  uses  reusability 
metrics  and  a  quality  model  defined  by 
user  to  identify  reusable  classes.  Their 
CHRis  tool  uses  natural- language  tech¬ 
niques  to  help  expert  deciding  whether  a 
class  implements  certain  useful  function¬ 
ality. 

6.5  Conclusion  and  Future 
Work 

In  this  position  paper,  we  described  our 
ongoing  work  aimed  to  identify  domain- 
specific  reusable  components.  The  funda¬ 
mental  motivation  of  the  proposed  method 
is  to  reduce  the  effort  spent  by  the  human- 
expert  to  identify  domain-specific  compo- 
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nent  candidates.  The  basic  premise  as¬ 
sumed  by  this  approach  is  that  reusable 
components  have  certain  quality  attributes 
like  functional  usefulness,  readability, 
testability,  etc.  that  can  be  broken  down 
(at  least  indirectly)  into  are  measurable 
items. 

Our  immediate  next  step  is  to  apply  the 
proposed  approach  on  large-scale  systems 
to  identify  the  benefits  and  limitations  and 
to  base  the  default  boundary  values  for  the 
metrics  on  the  experiences  we  will  make. 
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Abstract 

In  most  cases,  adaptation  is  required  to 
make  existing  components  suitable  to  the 
context  defined  by  a  product  line  architec¬ 
ture.  This  paper  presents  experience  on 
analyzing  the  product  line  adequacy  of  an 
existing  component  in  an  industrial  con¬ 
text.  Product  line  adequacy  is  based  on  the 
results  of  the  application  of  diverse  re¬ 
verse  engineering  techniques  (architecture 
evaluation,  clone  detection,  code  metrics, 
and  source  code  analysis).  The  paper  pre¬ 
sents  these  activities,  their  results,  and  the 
action  items  derived  to  integrate  the  com¬ 
ponent  into  the  product  line  context. 

Keywords:  ADORE,  product  line  archi¬ 
tecture  PuLSE-DSSA,  reengineering,  re¬ 
verse  engineering. 

7.1  Introduction 

Product  lines  are  sets  of  software¬ 
intensive  systems  sharing  a  set  of  features 
and  are  derived  from  a  common  set  of  re¬ 
usable  assets  [6].  Central  artifacts  of  prod¬ 
uct  lines  are  their  architectures,  which 
embrace  decisions  and  principles  valid  for 
each  derived  variant.  Hence,  architecture 
development  must  ensure  the  achievement 


of  organizational  and  business  goals,  func¬ 
tional  and  quality  requirements. 

Components  as  part  of  product  line  archi¬ 
tectures  are  explicitly  developed  for  sys¬ 
tematic  reuse.  That  is,  they  must  support 
the  scope  of  variability  required  and  be 
flexible  towards  the  anticipated  changes. 
Migrating  existing  components  into  prod¬ 
uct  line  components  thus  means  (in  addi¬ 
tion  to  resolving  potential  architectural 
mismatches  and  improving  the  internal 
quality)  injecting  the  required  variability 
support.  Existing  components  therefore 
require  a  certain  amount  of  adaptations  to 
achieve  sufficient  product  line  adequacy. 
Product  line  architects  face  difficult  deci¬ 
sions  whether  to  invest  in  the  migration  of 
existing  components  or  to  construct  new 
product  line  components  from  scratch. 
Hence,  they  pass  on  requests  to  reverse 
engineering  to  analyze  the  product  line 
adequacy  of  the  existing  components.  If 
decided  to  adapt  it,  reengineering  activi¬ 
ties  eventually  are  conducted  to  prepare 
the  existing  component  for  its  use  in  a 
product  line  context. 

In  this  paper,  we  present  a  particular  case 
of  such  a  decision  by  reporting  on  the 
analysis  of  an  existing  component  in  an 
industrial  context,  where  we  applied 
Fraunhofer  PuLSE™  (Product  Line  Soft- 
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ware  Engineering) 1  [2]  and  Fraunhofer 
ADORE™  (Architecture-  and  Domain- 
Oriented  Reengineering).2 

The  paper  is  structured  as  follows:  Section 
7.2  gives  context  information  on  the  case 
study,  while  Section  7.3  presents  the  ap¬ 
plied  approach.  Section  7.4  reports  on 
results  of  the  applied  techniques;  Section 
7.5  concludes  the  paper. 

7.2  Context 

The  case  study  was  conducted  in  a  large 
organization  migrating  towards  product 
line  engineering  following  the  PuLSE 
method.  The  organization  defined  a  prod¬ 
uct  line  architecture  for  a  family  of  multi- 
media  systems  in  the  automotive  domain. 
The  products  consist  of  two  major  parts:  a 
panel  (mainly  used  for  user  interaction) 
and  the  back-end  system  (mainly  used  for 
computation,  network  functionality,  and 
external  media). 

The  subject  component  of  our  case  study 
is  one  of  the  key  components  of  the  panel. 
This  Graphics  component  is  responsible 
for  the  complete  interaction  between 
backend  and  panel,  as  well  as  composition 
and  visualization  of  the  exchanged  ele¬ 
ments.  The  user  interface  is  based  on  pre¬ 
defined  masks.  A  mask  is  thereby  defined 
as  a  collection  of  graphical  elements  and 
positioning  information  (e.g.,  text  fields, 
bitmaps,  buttons,  lists,  labels).  The 


1  PuLSE  is  a  registered  trademark  of 
Fraunhofer  Institute  for  Experimental 
Software  Engineering  (IESE),  Kaiserslau¬ 
tern,  Germany. 

2  ADORE  is  a  registered  trademark  of 
Fraunhofer  Institute  for  Experimental 
Software  Engineering  (IESE),  Kaiserslau¬ 
tern,  Germany. 


graphical  elements  contributing  to  a  mask 
are  divided  into  static  information  relevant 
for  the  panel  only  and  dynamic  sequence 
control  information  coming  from  the 
back-end  system.  The  main  architectural 
driver  is  the  minimization  of  the  data  flow 
between  the  two  parts. 

7.3  Approach 

The  case  study  combined  two  Fraunhofer 
methods:  PuLSE,  in  particular  its  architec¬ 
tural  component  PuLSE-DSSA  (Domain- 
Specific  Software  Architecture),  and 
ADORE. 

7.3.1  PuLSE™-DSSA 

PuLSE-DSSA  deals  with  product  line  ac¬ 
tivities  at  the  architectural  level.  Since 
greenfield  scenarios  [6]  are  found  only 
rarely  in  industrial  contexts,  it  is  designed 
to  smoothly  integrate  reverse  engineering 
activities  into  the  process  of  developing  a 
product  line  architecture.  The  main  under¬ 
lying  concepts  of  the  PuLSE-DSSA  are: 

•  Scenario-based  development  in  itera¬ 
tions  that  explicitly  addresses  the 
stakeholders’  needs. 

•  Incremental  development,  which  suc¬ 
cessively  prioritizes  requirements  and 
realizes  them. 

•  Direct  integration  of  reengineering 
activities  into  the  development  proc¬ 
ess  on  demand. 

•  View-based  documentation  to  support 
the  communication  of  different  stake¬ 
holders. 

The  main  process  loop  of  PuLSE-DSSA 
consists  of  four  major  steps  (see  Figure  7- 
1). 
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Planning:  The  planning  step  defines  the 
contents  of  the  current  iteration  and  de¬ 
lineates  the  scope  of  the  current  iteration. 
This  includes  the  selection  of  a  limited  set 
of  scenarios  that  are  addressed  in  the  cur¬ 
rent  iteration,  the  identification  of  the 
relevant  stakeholders  and  roles,  the  selec¬ 
tion  and  definition  of  the  views,  as  well  as 
defining  whether  or  not  an  architecture 
assessment  is  included  at  the  end  of  the 
iteration. 

Realization:  In  the  realization  step,  solu¬ 
tions  are  selected  and  design  decisions 
taken  in  order  to  fulfill  the  requirements 
given  by  the  scenarios.  When  selecting 
and  applying  the  selected  solutions,  an 
implicit  assessment  regarding  the  suitabil¬ 
ity  of  the  solutions  for  the  given  require¬ 
ments  and  their  compatibility  with  design 
decisions  of  earlier  iterations  is  made.  A 
catalog  of  means  and  patterns  is  used  in 
this  phase.  Means  are  principles,  tech¬ 
niques,  or  mechanisms  that  facilitate  the 
achievement  of  certain  qualities  in  an  ar¬ 
chitecture  whereas  patterns  are  concrete 
solutions  for  recurring  problems  in  the 
design  of  architectures. 

Documentation:  This  step  documents 
architectures  by  using  the  organizational- 
specific  set  of  views  as  defined  in  the 
planning  step.  It  thereby  relies  on  standard 
views  as,  for  example,  defined  by  Kruch- 
ten  [8]  or  Hofmeister  [7],  and  customizes 
or  complements  them  by  additional  views. 


Assessment:  The  goal  of  the  assessment 
step  is  to  analyze  and  evaluate  the  result¬ 
ing  architecture  with  respect  to  functional 
and  quality  requirements  and  the 
achievement  of  business  goals.  In  an  in¬ 
termediate  state  of  the  architecture,  this 
step  might  be  skipped  and  the  next  itera¬ 
tion  is  started. 

PuLSE-DSSA  results  in  product  line  ar¬ 
chitectures  documented  in  a  selection  of 
architectural  views. 

7.3.2  ADORE™ 

The  architecture  development  yields 
product  line  components  that  have  to  be 
engineered.  Different  components  can  be 
engineered  concurrently  since  the  product 
line  architecture  has  defined  the  compo¬ 
nent  communication,  specified  the  re¬ 
quired  interfaces,  and  distributed  the  re¬ 
sponsibilities  among  the  components.  In  a 
migration  context  from  single  system  de¬ 
velopment,  this  allows  the  identification 
of  existing  components  in  the  domain  that 
already  fulfill  the  functional  requirements 
completely  or  at  least  partially  achieve 
them. 

To  decide  about  reusing  such  existing 
components,  the  component’s  internal 
quality  and  suitability  for  the  product  line 
have  to  be  evaluated.  It  has  to  be  ensured 
that  the  component  is  able  to  serve  the 
product  line  needs. 
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ADORE™  (Architecture-  and  Domain- 
Oriented  Reengineering)  is  a  request- 
driven  reengineering  approach  that  evalu¬ 
ates  existing  components  with  respect  to 
their  adequacy  and,  potentially,  integrates 
such  components  into  the  product  line: 

•  First,  existing  components  are  identi¬ 
fied  and  reverse  engineered  [4]  to  as¬ 
sess  their  adequacy;  this  activity  is  ini¬ 
tiated  by  requests  coming  directly 
from  the  product  line  architects. 

•  Second,  based  on  the  analysis  results, 
the  product  line  architects  decide 
whether  the  existing  component  is  re¬ 
used  or  a  new  product  line  component 
is  developed  from  scratch. 

•  Finally,  when  reusing  the  component, 
necessary  renovation  and  extension 
activities  are  kicked  off  to  adapt  the 
component  for  its  use  within  the  prod¬ 
uct  line. 


ADORE  is  mainly  instantiated  in  step  2  of 
PuLSE-DSSA  (realization),  when  the  ar¬ 
chitects  reason  about  whether  or  not  to 
reuse  existing  components.  The  architec¬ 
tural  needs  drive  the  selection  of  appropri¬ 
ate  reverse  engineering  analyses.  Analyses 
and,  potentially,  renovation  activities  are 
conducted  asynchronously  to  the  PuLSE- 
DSSA  iterations.  That  is,  the  current  itera¬ 
tion  of  the  architecture  development  may 
proceed  even  if  the  ADORE  activities  are 
delayed.  The  advantage  of  such  a  demand- 
driven  approach  is  that  investment  is  kept 
as  small  as  possible:  only  reverse  engi¬ 
neering  activities  are  performed,  renova¬ 
tions  are  conducted  only  after  the  decision 
to  include  the  component  in  the  product 
line. 

To  enable  stakeholder  reasoning  about 
such  a  decision  to  be  made,  certain  aspects 
of  the  component  have  to  be  lifted  to  a 
higher  level  of  abstraction.  Existing  com- 
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ponent  artifacts  (e.g.,  source  code,  docu¬ 
mentation,  configuration  files)  are  there¬ 
fore  exploited  and  the  information  ex¬ 
tracted  is  aggregated  in  a  repository.  Since 
the  repository  usually  has  a  lot  of  content, 
relevant  information  is  often  hidden  in 
overcrowded  low-level  models.  Thus,  fur¬ 
ther  analysis  activities  process  the  infor¬ 
mation  and  aim  at  creating  meaningful 
views  on  the  existing  component. 

Typical  goals  of  the  reverse  engineering 
part  in  ADORE  address  the  evaluation  of 
the  internal  quality  of  a  component,  the 
degree  of  variability  support,  the  provided 
flexibility  towards  anticipated  changes, 
the  compliance  of  a  component  to  the 
product  line  architecture,  or  in  case  there 
are  several  similar  implementations  of  a 
component,  the  identification  of  common¬ 
alities  among  them. 

7.4  Analysis  of  Component 
Adequacy 

An  existing  implementation  of  the  Graph¬ 
ics  component  was  identified  (written  in 
C++,  approximately  180  KLOC)  in  the 
domain  of  the  multimedia  system.  At  the 
time  of  the  analysis,  the  component  had  to 
be  adapted  to  deal  with  a  new  hardware 
technology,  so  the  source  code  was  not  yet 
fully  available  due  to  this  technology 
change.  The  product  line  architects  were 
doubtful  whether  the  existing  Graphics 
component  was  adequate  for  the  product 
line  and  suitable  to  the  architecture  de¬ 
signed  with  PuLSE-DSSA.  Therefore,  we 
instantiated  the  ADORE  approach  and 
reverse  engineered  the  Graphics  compo¬ 
nent  to  the  answer  the  following  ques¬ 
tions: 

•  Was  the  component  implemented  ac¬ 
cordingly  to  its  documentation,  how 
consistent  is  the  documentation  and 
can  it  integrated  seamlessly  into  the 


product  line  architecture?  To  answer 
these  questions  we  applied  static  ar¬ 
chitecture  evaluations. 

•  To  which  degree  contains  the  subject 
component  already  existing  variabil¬ 
ity?  Is  it  possible  to  relate  these  code¬ 
level  variations  to  higher  levels  (in 
best  case  to  the  product  map  coming 
from  scoping  activities)?  To  address 
this  request,  we  conducted  a  variabil¬ 
ity  analysis  and  refactored  prototypi- 
cally  some  variability  by  means  of  a 
frame  processor. 

•  What  are  maintenance  risks  of  the 
current  implementation?  This  request 
triggered  a  set  of  reverse  engineering 
activities:  source  code  analysis  includ¬ 
ing  clone  detection,  the  metric  com¬ 
putation,  a  naming  and  decomposi¬ 
tion  analysis. 

•  Another  request  of  the  architects  was 
concerned  with  the  potential  evolution 
of  the  algorithms  and  implementation 
decision  made  so  far.  We  conducted  a 
review  of  code  comments  to  address 
this  aspect. 

7.4.1  Static  Architecture  Evalua¬ 
tion 

The  consistency  of  the  component  to  its 
documentation  was  statically  evaluated 
with  the  help  of  the  SAVE  tool  (see  [9], 
based  on  the  idea  of  Reflexion  models 
[10]).  The  component  engineering  models 
decomposed  the  subject  into  the  three  in¬ 
ternal  layers  and  provided  a  mapping  to 
the  source  code  files.  Figure  7-2  depicts 
the  results  of  the  evaluation.  The  evalua¬ 
tion  shows  a  high  degree  of  consistency  so 
far  since  there  are  almost  no  violations  to 
the  documented  component  engineering 
model  (Layer- 1  uses  Layer-2,  grey  arrow, 
cardinality  1149);  there  are  only  two  ex- 
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ceptions:  the  divergences  form  Layer-2  to 
Layer- 1  (blue  arrow,  cardinality  2)  and  the 
absence  from  Layer-2  to  Layer-3).  The 
reason  for  the  latter  is  that  the  component 
is  currently  still  under  development  (the 
layer  was  only  realized  in  stubs).  The 
evaluation  showed  that  the  implementa¬ 
tion  so  far  did  follow  the  intended  design 
decisions,  although  detailed  analysis  of 
the  layers  gave  pointers  for  improvement. 


The  challenge  for  the  development  or¬ 
ganization  is  now  to  ensure  this  over  time. 
The  component’s  evolution  has  to  be 
monitored  when  new  variants  are  created 
based  on  this  first  product  line  component. 
To  keep  the  quality  and  to  avoid  degenera¬ 
tion,  we  recommended  quality  assurance 
activities  including  the  repetition  of  static 
architectural  evaluations  at  defined  check¬ 
points. 


Layer- 3 


Figure  7-2:  Component  Internal  Layers 


Figure  7-3:  Frame  Hierarchy 
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7.4.2  Variability  Analysis 

Conditional  compilation  with  macros  is  a 
common  means  to  realize  variants  in  the 
source  code.  Optional  or  variable  code 
parts  or  alternative  implementations  can 
be  implemented  in  a  common  source  code 
base  (with  preprocessor  commands  #if, 
#ifdef,  etc.),  and  the  resolution  of  the  vari¬ 
ants  is  taken  over  by  the  preprocessor.  The 
variability  analysis  checked  to  which  ex¬ 
tent  the  macros  and  compile  switches  real¬ 
ize  variability  with  respect  to  the  product 
map.  Identified  variability  was  docu¬ 
mented  to  make  the  variation  points  ex¬ 
plicit  and  derive  a  decision  model  that 
relates  the  macros  to  the  different  mem¬ 
bers  of  the  product  line. 

Furthermore,  we  exemplified  how  to  ex¬ 
tract  and  to  migrate  the  current  variabili¬ 
ties  into  more  advanced  tools  like  frame 
processors.  A  frame  processor  is  a  tool 
supporting  frame  technology  [1],  a  tech¬ 
nique  to  support  reuse  in  practice.  In 
frame  technology,  the  implementation 
units,  called  frames,  have  the  same  ap¬ 
pearance  as  those  in  any  major  program¬ 
ming  language.  They  form  a  group  of 
symbols  (e.g.,  source  code  or  frame  code) 
that  can  be  consistently  referenced. 

Frames  contain  both  source  code  and 
frame-specific  code  providing  adaptation, 
which  enables  reuse.  Frame-specific  code 
consists  of  frame  commands  and  frame 
variables  in  order  to  make  variation  points 
explicit  by  distinguishing  between  com¬ 
mon  and  variable  text.  Frames  can  be  ar¬ 
ranged  in  hierarchies  and  will  be  resolved 
at  compile  time  by  the  frame  processor,  an 
advanced  preprocessor. 

The  frame  processor  processes  the  frame 
hierarchy  and  generates  finally  pure 


source  code.  In  product  family  engineer¬ 
ing,  this  technique  is  used  to  produce  dif¬ 
ferent  product  instances  from  a  family  by 
explicit  variation  points  in  one  common 
code  base.  Figure  7-3  depicts  the  frame 
hierarchy  operating  system  dependent 
thread  handling  for  two  system  variants: 
the  target  variant  and  a  simulation  variant 
running  on  Windows.  Frames  positioned 
high  in  the  frame  hierarchy  can  adapt 
lower  frames  (by  an  ADAPT  statement  in 
the  frame),  on  the  lowest  level  there  are 
the  frames  containing  the  commonalities 
among  both  variants  (simulation  and  tar¬ 
get),  and  they  have  explicit  variation 
points.  These  variation  points  are  adapted 
by  higher  level  frames,  for  instance  a  VP 
filename. cpp  l  is  replaced  in  the 
adapt  l. frame  with  “#include  Win¬ 
dows. h”.  The  frame  hierarchy  was  ex¬ 
tracted  automatically  from  the  source  code 
(leading  to  non  meaningful  names  for  the 
variation  points  and  the  lower  level 
“adapt_*”  frames). 

The  frame  processor  enables  the  explicit 
management  of  the  frames,  the  hierarchy 
support  and  the  automatic  resolution  of 
the  variants.  Adding  a  new  variant  in¬ 
volves  only  the  creation  of  the  respective 
frames.  In  the  example,  another  OS  vari¬ 
ant  would  lead  to  the  respective  OS  frame 
and  an  additional  adapt  frame.  All  vari¬ 
ability  and  resolution  is  localized  in  these 
frames,  and  the  developers  work  only  with 
the  two  frames,  and  they  have  not  to  mod¬ 
ify  several  files  widespread  in  the  imple¬ 
mentation. 

7.4.3  Clone  Detection 

A  code  clone  is  a  code  fragment  that  oc¬ 
curs  in  more  than  one  place.  One  duplicate 
is  usually  the  master,  and  the  other  one 
(i.e.,  the  cloned  one)  is  produced  by  copy- 
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ing  the  master  (sometimes  containing  mi¬ 
nor  modifications).  Code  clones  are  a 
threat  to  the  evolution  code  size,  higher 
effort  for  maintenance  (when  there  is  the 
need  for  a  change,  all  duplicates  have  to 
be  addressed),  reduced  code  readability, 
increased  risk  because  an  error  can  be 
propagated  to  several  places  in  the  source 
code  which  leads  to  high  effort  for  the 
removal  of  such  an  error. 

We  analyzed  the  source  code  with  a  clone 
detection  tool  based  on  text  pattern  match¬ 
ing  for  two  objectives.  First,  we  aimed  at 
detecting  internal  clones  (duplicated  code 
lines  found  in  a  single  file),  and  second 
external  clones  (clones  found  in  different 
files).  The  analysis  identified  a  number  of 
code  clones  with  a  size  greater  than  20 
lines  to  be  reviewed  by  the  developers). 

7.4.4  Metric  Hotspots 

Source  code  metrics  are  an  objective 
means  to  learn  about  potentially  problem¬ 
atic  areas  in  the  source  code.  By  measur¬ 
ing  coupling,  size  and  complexity  metrics 
and  analyzing  the  outliers,  unanticipated 
values,  problematic  areas  and  hotspots  in 
the  source  code  can  be  identified.  In  par¬ 
ticular,  we  had  significant  outliers  for  the 
following  metrics:  cyclomatic  complexity 
(for  methods  and  class  averages),  CBO 
(coupling  between  object  classes),  NOC 
(number  of  derived  children),  and  function 
size  in  terms  of  LOC  (lines  of  code).  The 
identified  source  code  items  have  been 
triggered  for  code  reviews,  since  such 
elements  are  error-prone,  bring  along  the 
risk  of  unwanted  side  effects,  and  are  dif¬ 
ficult  to  understand.  To  avoid  potential 
maintenance  problems,  these  items  are 
reviewed  carefully. 


7.4.5  Naming  and  Decomposition 
Analysis 

When  conducting  a  detailed  source  code 
analysis,  a  number  of  issues  arose  that 
became  additional  action  items: 

•  File  system  representation:  the  folder 
structure  and  the  code  files  did  not  re¬ 
flect  the  decomposition  as  it  was 
documented. 

•  Empty  files:  a  couple  of  empty  (or 
almost  empty)  files  were  identified 
(less  than  20  LOC).  The  files  are  re¬ 
viewed  whether  it  is  reasonable  to 
merge  them  with  other  files  or  they 
can  be  eliminated  from  the  code. 

•  Inconsistent  naming  conventions:  al¬ 
though  there  were  naming  conven¬ 
tions  in  the  code,  they  were  not  used 
consistently  throughout  the  compo¬ 
nent. 

7.4.6  Code  Comments 

A  major  issue  was  the  ratio  of  commented 
lines  to  source  code  lines,  which  was  be¬ 
low  1 0  percent.  The  developers  agreed  on 
improving  this  to  facilitate  the  reading  of 
the  source  code  and  to  not  run  into  prob¬ 
lems  when  evolving  the  components  fur¬ 
ther. 

7.5  Summary 

The  reverse  engineering  results  revealed 
that  the  Graphics  component  has  a  suffi¬ 
cient  adequacy  for  the  emerging  product 
line  and  the  product  line  architects  de¬ 
cided  to  reuse  the  existing  component. 
However,  the  results  made  the  need  for 
renovations  and  extensions  obvious  to 
fully  address  the  product  line  require¬ 
ments.  An  action  list  (improvement  of  the 
internal  quality,  assurance  of  consistency 
to  the  documentation  and  the  intended 
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design,  refactoring  of  variabilities,  re¬ 
moval  of  architectural  mismatches,  and 
the  integration  of  the  component)  and 
items  for  a  detailed  analysis  (code  reviews 
and  inspections)  were  derived  directly 
from  the  reverse  engineering  results.  For 
the  final  acceptance  of  the  Graphics  com¬ 
ponent  as  a  product  line  component,  these 
issues  should  be  revisited. 

The  well-invested  effort  for  reverse  engi¬ 
neering  lead  to  an  architectural  reuse  deci¬ 
sion  that  was  well-grounded  and  sound, 
based  on  the  reverse  engineering  results. 
Since  the  component  was  not  yet  fully 
implemented,  the  developers  were  able  to 
address  most  of  the  suggested  renovation 
items  promptly  in  the  ongoing  develop¬ 
ment. 

In  summary,  this  experience  report  pre¬ 
sents  a  typical  industrial  case  where 
Fraunhofer  PuLSE™  and  ADORE™  are 
applied  in  combination.  As  it  was  in  this 
case,  there  is  generally  a  need  for  adapta¬ 
tion  when  components  developed  for  sin¬ 
gle  systems  should  become  part  of  a  prod¬ 
uct  line  infrastructure.  In  our  experience 
so  far,  as-is  reuse  mostly  does  not  work 
since  there  is  always  a  need  for  adaptation 
to  make  the  existing  component  suitable 
for  the  product  line. 

Hence,  there  is  a  strong  need  for  efficient 
and  focused  reverse  engineering  analyses 
that  support  the  reuse  decision  making,  in 
this  case  by  analyzing  the  adequacy  of  the 
existing  components.  In  addition,  it  is  im¬ 
portant  to  identify  and  estimate  the  degree 
of  required  adaptation  base  the  product 
line  architects  reuse  decision  on  a  well- 
grounded  fundament. 


7.6  References 

[1]  P.  G.  Basset,  Framing  Software  Reuse: 
Lessons  From  The  Real  World,  Prentice- 
Hall,  1996. 

[2]  J.  Bayer  et  al.:  “PuLSE:  A  Methodol¬ 
ogy  to  Develop  Software  Product  Lines,” 
5th  Symposium  on  Software  Reusability 
(SSR'99),  1999. 

[3]  J.  Bayer  et  al:  Definition  of  Reference 
Architectures  Based  on  Existing  Systems, 
(IESE-Report  034.04/E),  2004 

[4]  E.  Chikofsky,  J.  H.  Cross:  Reverse 
Engineering  and  Design  Recovery:  a  Tax¬ 
onomy,  IEEE  Software,  7(1):  13-17,  Janu¬ 
ary  1990. 

[5]  P.  Clements,  R.  Kazman,  M.  Klein: 
Evaluating  Software  Architectures:  Meth¬ 
ods  and  Case  Studies,  Addison- Wesley, 
2002. 

[6]  P.  Clements,  L.  M.  Northrop:  Software 
Product  Lines:  Practices  and  Patterns, 
Addison- Wesley,  2001. 

[7]  C.  Hofmeister,  R.  Nord,  R.,  D.  Soni: 
Applied  Software  Architecture.  Addison- 
Wesley,  1999. 

[8]  P.  Kruchten:  The  4+1  View  Model  of 
Architecture.  IEEE  Software,  November 
1995  12(6):42-50. 

[9]  P.  Miodonski,  T.  Forster,  J.  Knodel,  M. 
Lindvall,  D.  Muthig:  Evaluation  of  Soft¬ 
ware  Architectures  with  Eclipse,  Kaiser¬ 
slautern,  2004,  (IESE-Report  107.04/E) 

[10]  G.  C.  Murphy,  D.  Notkin,  K.  Sulli¬ 
van:  Software  reflexion  models:  bridging 
the  gap  between  source  and  high-level 
models,  ACM  Software  Engineering 
Notes,  1995. 


CMU/SEI-2006-SR-002 


45 


46 


CMU/SEI-2006-SR-002 


8  Mining  Existing  Software  Product  Line 
Artifacts  using  Polymorphic  Dependency 
Relations1 


Igor  Ivkovic  and  Kostas  Kontogiannis 

Dept,  of  Electrical  and  Computer  Engineering 
University  of  Waterloo 
Waterloo,  ON  N2L3G1  Canada 
{iivkovic,  kostas}@swen.uwaterloo.ca 


Abstract 

Development  of  a  product  line  architec¬ 
ture  involves  mining  existing  software 
assets,  from  architecture-level  design 
knowledge  to  implementation-level  arti¬ 
facts.  Each  mining  effort  is  generally  as¬ 
sociated  with  an  appropriate  mining  con¬ 
text,  through  which  the  criteria  for 
component  identification  and  selection  are 
defined.  The  crux  of  the  matter  is  variabil¬ 
ity,  where  a  mining  context  has  to  be  spe¬ 
cific,  to  allow  for  precise  component  que¬ 
rying,  but  it  also  has  to  be  adaptable  and 
extensible,  to  accommodate  the  needs  of 
different  software  product  line  instances. 
In  this  paper,  we  introduce  a  framework 
for  annotating  and  querying  heterogene¬ 
ous  software  artifacts  using  polymorphic 
dependency  relations  in  software  product 
line  reengineering.  The  dependency  rela¬ 
tions  are  defined  based  on  the  theory  of 
semantic  values,  where  an  association  rule 
is  represented  as  a  combination  of  differ¬ 
ent  semantic  properties  and  values  (se¬ 
mantic  contexts),  such  as  features  and 
constraints  defined  at  the  model  or  meta¬ 


model  level.  The  chosen  association  rules 
denoted  through  a  mining  context  are  used 
to  query  individual  component  annota¬ 
tions. 

Keywords:  software  reengineering,  soft¬ 
ware  product  lines,  mining  existing  assets, 
semantic  value  theory,  polymorphic  de¬ 
pendency  relations 

8.1  Introduction 

In  model-driven  software  evolution,  soft¬ 
ware  artifact  models  are  changed  at  differ¬ 
ent  levels  of  abstraction.  For  instance,  use 
case  models  are  used  to  apply  change  at 
the  requirements  specification  level  while 
deployment  diagrams  are  used  to  manifest 
change  at  the  deployment  and  integration 
levels.  A  mutation  of  an  artifact  model  at 
one  level  may  affect  models  at  the  same  or 
at  different  levels  of  abstraction  and  detail. 
For  example,  a  change  in  a  design  model 
could  directly  affect  architectural  models 
at  the  higher-level,  and  implementation 
models  at  the  lower-level  of  abstraction. 

To  enable  impact  analysis  and  propagation 
of  changes  that  may  arise  due  to  evolu- 


1  This  work  is  funded  in  part  by  the  IBM  Canada  Ltd.  Laboratory,  Center  for  Advanced  Studies 
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tion,  it  is  necessary  to  establish  and  main¬ 
tain  associations  among  related  models 
and  their  elements.  In  previous  research 
[3,  4],  we  have  introduced  an  approach  for 
the  identification  and  encoding  of  depend¬ 
ency  relations  among  heterogeneous  soft¬ 
ware  artifacts  using  formal  concept  analy¬ 
sis  (FCA).  As  part  of  the  approach,  each 
software  model  that  is  MOF  compliant  [6] 
is  represented  in  terms  of  its  objects  and 
attributes.  Objects  that  share  common  at¬ 
tributes  are  considered  dependent,  and  are 
identified  using  a  FCA  algorithm  [2].  To 
match  attributes  of  heterogeneous  con¬ 
texts,  we  have  introduced  the  notion  at¬ 
tribute  association  rules.  At  the  domain 
model  or  metamodel  levels,  attribute  asso¬ 
ciations  represent  the  functions  for  map¬ 
ping  compatible  types  and  relations,  while 
at  the  model  level,  they  are  used  to  map 
attributes,  features,  and  annotations  of 
individual  model  elements.  The  approach 
was  applied  to  establish  dependency  rela¬ 
tions  between  business  process  models 
and  enacting  Java  source  code  by  mining 
information  flow  models  using  business 
workflow  patterns. 

In  this  paper,  we  extend  our  FCA-based 
approach  by  defining  dependency  rela¬ 
tions  using  the  theory  of  semantic  values 
[8].  Each  dependency  relation  represents  a 
composition  of  individual  semantic  val¬ 
ues.  The  meaning  of  a  relation  is  indicated 
by  an  association  context,  which  repre¬ 
sents  a  set  of  semantic  properties  and  val¬ 
ues.  For  example,  semantic  values 
Class(Language=‘UML’(Represents=‘Obj 
ects’))  and 

class(Language=‘Java’(Represents=‘Obje 
cts’))  are  associated  based  on  the  context 
{Represents=‘Objects’},  but  in  contrast, 
are  not  associated  based  on  the  context 
{Language=‘UML’}.  We  treat  each  asso¬ 


ciation  context  as  a  polymorphic  type,  for 
which  we  can  derive  a  subtype  by  extend¬ 
ing  the  association  context  (i.e.,  reduce  its 
scope),  or  a  supertype  by  reducing  the  as¬ 
sociation  context  (i.e.,  extend  its  scope). 

In  the  context  of  software  product  line 
reengineering,  the  polymorphic  types  are 
first  used  to  annotate  existing  software 
assets.  Then,  a  mining  context  is  defined 
as  a  combination  of  different  association 
contexts,  for  example,  with  different  con¬ 
texts  for  different  types  of  artifacts.  Using 
the  mining  context  as  basis,  candidate 
components  are  mined,  and  the  most  suit¬ 
able  selected  for  reuse  in  product  line  de¬ 
velopment. 

The  remaining  content  of  this  paper  is  or¬ 
ganized  as  follows:  Section  8.2  explains 
semantic  annotations  of  existing  assets 
using  semantic  heads.  Section  8.3  de¬ 
scribes  the  structure  of  the  mining  context 
and  explains  how  the  mining  context  is 
used  to  query  candidate  components.  Fi¬ 
nally,  Section  8.4  provides  our  conclu¬ 
sions  and  directions  for  future  research. 

8.2  Semantic  Annotation  of 
Software  Artifacts 

The  Options  Analysis  for  Reengineering 
(OAR)  approach  [1]  prescribes  that  after 
creating  a  mining  context,  it  is  necessary 
to  inventorize  available  components,  and 
identify  their  functionality,  language,  in¬ 
frastructure  support,  and  interfaces.  Based 
on  this  inventory,  candidate  components 
can  be  selected  as  the  ones  that  match  the 
criteria  of  the  mining  context.  However, 
the  OAR  description  does  not  provide  a 
formalism  for  specification  of  component 
properties,  nor  does  it  provide  an  algo¬ 
rithm  for  matching  the  criteria  of  the  min¬ 
ing  context  and  individual  components. 
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«profile» 

Annotation  Context 


Figure  8-1:  Annotation  Context  UML  Profile 


We  propose  to  define  a  systematic  ap¬ 
proach  to  annotation  of  existing  software 
assets  by  associating  each  available  com¬ 
ponent  with  a  corresponding  semantic 
head.  Each  semantic  head  represents  a  set 
of  semantic  values,  defined  according  to 
the  theory  of  semantic  values  as  described 
above  [8].  As  part  of  our  view  of  software 
artifacts  as  MOF-compliant  models,  we 
use  UML  2.0  metamodel  [7]  as  the  basis 
for  representation.  Hence,  as  shown  in 
Figure  8-1,  we  define  «semanticHead» 
stereotype  as  part  of  the  Annotation  Con¬ 
text  UML  profile.  Each  semantic  head  is 
associated  with  one  model  element,  and  it 
contains  zero  or  more  semantic  values. 
The  elements  of  the  semantic  head  are 
created  as  part  of  the  component  inven¬ 
tory,  and  they  may  include  implemented 
features  such  as  (Fea- 
ture=‘DatabaseAccess’,  Limita¬ 


tion=‘DataManipulation’),  interface  prop¬ 
erties  such  as  (Inter- 
faceType=‘Proxy’(Protocol=‘HTTP’)), 
language  properties  such  as  (Implementa- 
tionLanguage= ‘  Java ’  (Dialect=  ‘ Enterprise 
Java  Beans’)),  and  environment  con¬ 
straints  such  as  (PlatformIndependence=‘ 
Yes’(OperatingSystem=‘ Windows’)). 

8.3  Defining  the  Mining  Context 

Once  we  have  annotated  components  with 
specific  semantic  properties,  we  can  query 
them  to  identify  those  of  specific  interest 
and  suitability  for  reuse  in  reengineering 
towards  product  lines. 

As  shown  in  Figure  8-2,  we  create  the 
mining  context  as  the  collection  of  spe¬ 
cific  association  rules.  We  represent  each 
rule  as  a  collection  of  semantic  properties 
and  values  that  are  used  to  query  individ¬ 
ual  component  annotations.  For  instance, 
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to  find  all  components  that  use  HTTP  pro¬ 
tocol  for  communication,  we  could  use 
{Protocol=‘HTTP’}  as  the  association  con¬ 
text.  We  can  also  have  different  associa¬ 


tion  contexts  for  different  component 
types,  for  example,  for  areas  such  as  data¬ 
base  access,  role-based  access  control,  and 
user  interfaces. 


Figure  8-2:  Mining  Context  as  a  UML  Profile 
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Model  1  (UML): 


Name  (UML) 

Type  (UML) 

01  (UML  Class) 

X 

X 

02  (UML  Class) 

X 

03  (UML  Association) 

X 

04  (UML  Association) 

X 

Mapping  Heterogeneous  Attributes 
using  Attribute  Association  Rules  AR 


Model  2  (Architectural  Description  Language  (ADL)): 


Name  (ADL) 

Type  (ADL) 

0T  (ADL  Component) 

X 

X 

02'  (ADL  Component) 

X 

03'  (ADL  Connector) 

X 

04'  (ADL  Connector) 

X 

Figure  8-3:  Attribute  Association  Rules  for  Mapping  Heterogeneous  Semantic  Values 


For  compatible  properties  that  are  at  dif¬ 
ferent  levels  of  granularity  or  scale,  con¬ 
version  functions  may  be  used.  For  in¬ 
stance,  contexts 

(ImplementationLanguage=‘Java’)  and 
(ImplementationLan- 
guage=‘ObjectOriented’)  can  be  mapped 
using  cvtlmplementationLanguage  to  rep¬ 
resent  Java  at  a  higher  level  of  granularity 
as  an  object-oriented  language.  Also,  se¬ 
mantic  value  30(Metric=‘  AccessPerfor- 
mance’,  MetricScale=‘miliseconds’)  can 
be  converted  into 

0.3(Metric=‘AccessPerformance’,  Metric- 
Scale^  seconds’)  by  using  the  scaling 


function  cvtMetricScale  with  the  scaling 
factor  100. 

For  incompatible  properties,  such  as  prop¬ 
erties  from  different  domains  as  shown  in 
Figure  8-3,  conversion  may  be  performed 
using  attribute  association  rules  including: 

•  Feature  hierarchies,  where  contexts 
are  matched  if  they  related  to  a  se¬ 
lected  feature  or  one  of  its  sub¬ 
features. 

•  Lexicographical  matching,  where  con¬ 
texts  are  matched  as  text  using  various 
information  retrieval  techniques  such 
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as  n-gram  matching,  word-matrix 
matching,  or  latent-semantic  indexing. 

•  Spatial  matching,  where  contexts  are 
matched  based  on  their  relation  to  a 
specific  data  flow. 

8.4  Conclusions  and  Future 
Research 

In  this  paper,  we  have  presented  a  frame¬ 
work  for  defining  the  mining  context  for 
mining  existing  software  assets  using  the 
theory  of  semantic  values.  We  have  pre¬ 
sented  an  approach  for  annotating  avail¬ 
able  components  with  corresponding  se¬ 
mantic  properties  and  values.  We  have 
also  discussed  the  creation  of  the  mining 
context  using  association  rules,  and  use  of 
the  defined  association  rules  to  query  and 
select  suitable  components. 

In  future  research,  we  aim  to  extend  the 
approach  by  more  formally  specifying  the 
annotations  and  annotation  categories.  We 
also  intend  to  adapt  the  approach  to  other 
mining  steps  as  described  in  the  OAR  ap¬ 
proach,  such  as  component  refactoring, 
and  relate  it  to  more  recent  reengineering 
methods  such  as  the  Service-Oriented  Mi¬ 
gration  and  Reuse  Technique  (SMART) 

[5]. 
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9  Workshop  Outcomes 

The  results  of  this  workshop  consist  of  the  papers  presented  during  it  (included  in  Sections 
4-8),  the  outcomes  of  discussions  triggered  by  those  papers,  and  some  agreements  related  to 
the  organization  and  continuation  of  this  workshop  and  research  topic. 

The  papers  that  were  presented  could  roughly  be  said  to  address  two  main  issues: 

1 .  variation  points  (papers  by  Olumofm  and  Comelissen) 

2.  identification  of  product  line  assets  (papers  by  Ganesan,  Knodel,  and  Ivkovic) 

Part  of  the  research  is  focused  on  the  application  and  extension  of  existing  reverse  and  reen¬ 
gineering  techniques  to  a  product  line  context.  The  problem  that  needs  to  be  solved  is  how  to 
use  these  techniques  (that  were  previously  applied  only  to  individual  systems)  to  handle  mul¬ 
tiple  software  variants. 

For  instance,  in  the  case  of  dynamic  analysis,  the  use  of  reverse  and  reengineering  techniques 
implies  additional  difficulties  with  respect  to  the  determination  of  useful  scenarios  to  obtain 
traces  from  different  variants  of  a  software  system.  Comelissen  used  this  approach  for  the 
detection  of  potential  variation  points  in  a  product  line  architecture. 

Interestingly,  the  technique  presented  by  Olumofm  requires  exactly  that  information  about 
variation  points.  When  combined  with  knowledge  on  sensitivity  points  (which  can  be  discov¬ 
ered  using  the  SEI  Architecture  Tradeoff  Analysis  Method®  [ATAM®]  developed  by  the  Car¬ 
negie  Mellon'5  Software  Engineering  Institute  [SEI])  evolvability  points  can  be  identified  that 
deserve  special  attention  during  software  evolution  to  ensure  the  architectural  conformance  of 
the  product  line  architecture  and  product  family  members. 

Most  approaches,  however,  focused  on  the  use  of  static  information.  Two  approaches  were 
presented  to  find  the  software  components  that  should  be  considered  reusable  assets  for  a 
product  line.  The  approach  by  Ganesan  and  Knodel  is  based  on  metrics,  while  the  approach 
proposed  by  Ivkovic  uses  semantic  annotations  to  find  the  components  in  which  there  is  in¬ 
terest.  Finally  an  approach  was  presented  that  combines  several  techniques,  such  as  metrics 
and  clone  detection,  to  assess  the  extent  to  which  an  existing  software  component  is  suitable 
for  reuse  in  a  product  line  environment. 

Various  problems  involving  the  introduction  of  software  product  lines  in  software  develop¬ 
ment  organizations  were  covered  in  this  workshop.  The  focus  was  on  the 

•  detection  of  variability  and  how  to  use  this  information  to  ensure  successful  software 
evolution 


Architecture  Tradeoff  Analysis  Method,  ATAM,  and  Carnegie  Mellon  are  registered  in  the  U.S. 
Patent  and  Trademark  Office  by  Carnegie  Mellon  University. 
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•  identification  of  components  that  are  amenable  to  become  product  line  assets 

•  determination  of  the  extent  to  which  these  components  are  suited  to  be  product  line  assets 
as  is 

Other  issues  that  still  need  to  be  investigated  in  the  future  include 

•  the  derivation  of  a  complete  product  line  architecture  (instead  of  focusing  on  identifying 
individual  components) 

•  combinations  of  different  approaches,  such  as  dynamic  analysis  and  metrics  or  other 
static  approaches 

•  the  process  steps  involved  in  the  migration  towards  a  product  line  approach 

•  the  scalability  of  the  proposed  approaches  and  potential  tool  support  for  them,  as  product 
line  architectures  typically  concern  large  systems 

•  the  reusability  of  other  artifacts  after  migration  to  a  product  line  approach,  such  as  test 
cases  and  architectural  views 

•  traceability  from  legacy  to  product  line  artifacts 

One  more  important  issue  came  up:  participants  found  it  difficult  to  find  suitable  case  studies 
for  applying  their  techniques.  Such  case  studies  should  consider  not  only  the  availability  of  a 
complete  product  line  example  system  but  also  a  set  of  existing  software  variants  that  can  be 
migrated  to  a  software  product  line. 

As  a  follow-up  of  this  workshop,  a  mailing  list  was  set  up  (r2pl@st.ewi.tudelft.nl)  and  the 
need  for  a  successor  workshop  in  2006  was  confirmed. 
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